The Humanity Hypothesis is another game I’m working on with my friend Brian ‘Psychochild’ Green. Brian is currently busy writing tons of content for it and the massive amount of content brought an interesting challenge.

In this game, the player has to converse with a machine to explore the following hypothesis: Humanity will not survive without intervention. All dialogues are voiced and while it would have been nice to coach a voice artist to handle hours of content, it just wasn’t within our reach. Fortunately, since the player is talking to a machine, text-to-voice with a few audio tweaks fits perfectly here.

To be able to create the right experience, we’re using two tools you can find on Unity’s assets store: Easy Voice and Dialogical. Dialogical isn’t the most powerful tool to handle dialogues available out there but its strength is to be simple and that it can handle just fine our needs for this project.

Quick overview of how everything interacts together

First, dialogues and branching is handled in Dialogical. This is where we input what the machine will say and the choices the player will have to answer. Each answer redirects to another piece of dialogues which will ultimately influence a bunch of variables and lead the player to various endings.

Then we need to make sure each dialogue has an audio file associated with it so you can actually hear the machine talking to you (dialogues are also displayed on the screen so you can play without sound if that’s something you prefer). This means we have to copy/paste each dialogue into Easy Voice, assign a name to the file that will be generated, activate the process to create the audio files and then assign the right files with the right dialogues in Dialogical.

Before you ask, no, dynamic text-to-speech isn’t an option as this technology is highly dependent on the machine the player is using. In other words, as far as I know, there is no tool that can handle text-to-speech on the spot to create identical results on various machines and OS. Because of this, we have to generate all audio files and ship them with the game.

One of the many trees of dialogues in The Humanity Hypothesis

One of the many trees of dialogues in The Humanity Hypothesis

Risk of errors in manual manipulation

Since we’re dealing with hours of content, there’s always the risk of assigning the wrong audio clip to a dialogue. If you have to make changes to a dialogue you need to remember to reflect the change in Easy Voice and then need to create again the audio file and then assign it again to the right dialogue, there’s simply too many ways mistakes can be made by handling this process manually.

Since Dialogical is so easy to use, we didn’t quite see how problematic it would end up being to deal with such a huge amount of content.

Just a few of the many audio files we're dealing with

Just a few of the many audio files we’re dealing with

Automation to the rescue

We have all the right tools to create the game but the weakest link is ourselves. We already proved the tools we’re using are right for the job but there’s always a risk that we will screw everything ourselves. The answer? Remove ourselves from the equation.

This means that we must remove human interaction from the process of creating and assigning all audio files to dialogues. For a game in which the player must interact with a machine this seems like a natural way to go. :)

In Unity, I created a scene dedicated to automate the process of creating audio files. It’s as simple as listing every singe tree of dialogues, looping through each of their nodes and activating the process of creating an audio file based on the content of a node.

There was still the problem of assigning each audio file with the proper node. Doing so manually is just too risky as I explained before. Thankfully, Brian came up with a clean way to uniquely identify each node for all trees of dialogues. For example, the node called P.1.1.a is unique to all dialogues in the game so if we name its audio file P.1.1.a then it becomes easy to associate the right file to the right node.

We decided to stream all audio files in real-time while the player is inside the game and while I feared it might cause some performance problems, it turned out that you really can’t tell how the audio files are loaded. In the end, it will also make easier to patch the game in case we need to change a few dialogues after release.

So with this simple approach, we turned dealing with an impressive amount of content prone to human manipulation errors a smooth and easy process.

I don’t often get excited by lines of codes but I must say here that I was giggling all the way through of coding this part of the game!

Vote for The Humanity Hypothesis on Steam Greenlight!