As more of our interactions with companies shift away from the visual and toward the verbal–whether thanks to Echo and Google Home or automated customer service systems–the tone, quality, and cadence of a company's voice is becoming the new face of the brand.
Voicery's model works differently from the traditional model creating synthetic voices, as heard in many devices. It only needs a few hours of a voice actor's speech, on which it trains a deep neural network to imitate that person's voice. The entire process, from casting an actor, to having them read sets of phrases, to actually training the computer, takes about two weeks. Creating a single synthetic voice's neural net model takes four days. At the moment, Voicery has three production-ready synthesized voices, drawn from voice actors or from audiobooks that are all in the public domain.