Abstract

Deepstory is a machine learning approach to storytelling. It integrates several deep learning models of natural language generation [1], text-to-speech [2], speechdriven facial animation [3], and image animation [4] to create a framework that generates a video of the character speaking the generated content from the book and reading it with the character’s voice from an adaption such as video games. It applies to any franchise that has been adapted into multimedia. In this project, The Witcher franchise, which has been adapted into video games and TV series, is chosen as the project’s application. Text data from The Witcher books and audio data from The Witcher 3: Wild Hunt have been collected and preprocessed to train models for the natural language generation model and the text-to-speech model, while pretrained models are used for the speech-driven facial animation model and image animation model. To put everything together in an interactive way, a Flask-based Python web service is created to provide an interface that handle user input and display results.


[1] https://openai.com/blog/better-language-models/

[2] https://arxiv.org/abs/1710.08969

[3] https://arxiv.org/abs/1906.06337

[4] http://papers.nips.cc/paper/8935-first-order-motion-model-for-image-animation