Amazing results, big progress

Since the last time I’ve had issues training the SSRN, I was hesitant to do that again. But this time, I was able to train a usable SSRN model.

Combining the SSRN model with the Text2Mel model that I trained, the synthesized audio is quite realistic. Really glad that I tried again.

Now I’m further checking the audio source and improve the pre-processing of the audio CSV. I had to remove some audio clips in a scene where the character is drunk and murmured.

And when I’m rewriting the parsing script, I’ve encountered some encoding issues and separator stuff. I hate bugs. It turns out I just had to add the no quoting arguments in both reading and writing the CSV.

Here is some synthesized audio of the character.

Well I can’t even tell it’s not real.