Have you asked speed2 or somebody? it seems really weird to me that you cannot time the text bubbles. Ill try to play it when i have time, maybe weekend. I think it makes more sense to concentrate on playtesting before creating voiceovers of any kind.
If the time really cannot be changed for text, i would just use a simple text-to-speech AI generator for now, or just have a random guy speek it into mic unprofessionally. Good things are made in iteration, you definitely want to gather feedback before you make the final voiceovers.
Edit: Alternatively, if putting audio into the mission is tiring, we could just generate a number of "silent tracks" for each length in seconds like
silent_audio_2s.wav
silent_audio_4s.wav
silent_audio_6s.wav
and then you can just quickly measure your own voice while working on the mission and insert the corresponding wav. I could create these audio files in minutes if i know the correct audio settings.