When it comes to synthetic video, seeing is believing. Synthetic video pairs any face with any words, in any language, in any voice. It’s created not by a person filming an actor on a set, but by artificial intelligence (AI), in cyberspace. The result is surprisingly realistic, as this demo from Bloomington-based startup Deep Word shows.
CEO Ankush Bikkasani, a senior at the Indiana University Kelley School of Business, launched Deep Word in November 2020. Since then, Deep Word has acquired 19,000 users, who have generated 38,000 synthetic videos.
As a freelance videographer, Bikkasani experienced the pain points associated with traditional filming firsthand: traveling to a location, setting up equipment, filming multiple takes, and so on. He started to read about deep fakes and the available software, but found that wasn’t a solution. “Deep fakes are essentially a very high-quality face swap,” he explained in an interview with The Mill, Bloomington’s center for entrepreneurship. “They put your face on mine, or my face on yours. But what we needed was the ability to change or modify the things that somebody was saying.”
In the summer of 2020, Bikkasani began talking to friends studying data science at Indiana University, and a few months later, they launched Deep Word’s current prototype. Users select one of Deep Word’s video actors or upload their own video, and then supply either a script or an audio file. The AI generates a new video that matches the original video to its new audio in minutes. Deep Word’s AI converts text to audio spoken through an artificial voice called a neural voice. Deep Word trains its neural voices on 30-40 hours of data of people talking. Eventually, they hope to offer users the ability to clone their own voices (without having to sit behind a microphone for a week).
There are a few limitations: the technology works best with a relatively stationary actor facing the camera, for example. But that’s improving quickly. Even over the last six months, Deep Word’s demo videos have added more hand gestures and movement.
One of the biggest opportunities for application, Bikkasani noted, lies in e-learning. “Research shows that when people have a face to associate with the information they’re learning, they retain almost up to 40% more information. And a lot of content being produced doesn’t have that. So it’s a very easy and cheap value proposition to these companies. We’re communicating, ‘Hey, if you’re producing content, or if your past content doesn't have a teacher, just integrate with our software, and you can automatically have teachers overlaid over all of your content.’”
Although it’s very early in the field, other deep fake software is already online and open for anyone to use. In fact, Deep Word has competitors in Synthesia and Rephrase.ai, although their technology works differently. “They are essentially puppeting faces,” Bikkasani explained. “Every time their software sees a new face, they have to train a model to output video with that face. Ours is a generalized model, meaning that it will work with video of anyone without further training, so it's a much faster and more versatile process. If I wanted to integrate a thousand video actors into our website in the next hour, I could, but for them, each one would take several days of model training and integration.”
Synthetic video’s potential—to quickly and easily put words into anyone’s mouth, on video, in a realistic way—raises obvious concerns about ethics, as well as business concerns about regulation. Bikkasani and his team have established strict ethics for using their product, and they’re prepared to comply with regulations.
“At the end of the day, we only want content being produced that is intended to be produced by the people who are in the video,” he asserted. “We really put our foot down. We monitor all the content produced through our website. We’ve developed auto flagging systems for content, and we're working on an internal video verification tool.”
Bikkasani sees potential in that verification tool not only as an internal solution for Deep Word, but as a large market opportunity in itself: a chance to become the standard for verifying if a video has been produced synthetically. Synthetic video also has the potential to make a positive ethical impact, by making it easy to increase the diversity of faces, voices, and languages represented in training and educational videos.
“I think synthetic video is a hundred percent here to stay,” Bikkasani stated. “It's just too much of an improvement—or its potential is too much of an improvement—over how we currently produce video. And I think that regulators will understand that. It's an evolving field. It's a very gray area. The ultimate goal is that we and other companies hold the same ethical grounds, but we can't always guarantee the perceptions of others.”
In addition to CEO Bikkasani, the Deep Word team includes two data scientists and a software engineer, all IU graduates. In 2020, Deep Word won a $20,000 pre-seed award in the Elevate Nexus Regional Pitch Competition. They also secured $100,000 in Amazon Web Services (AWS)—another important win that saved the company 85% on operational costs. Earlier this year, Deep Word placed first at the Clapp IDEA Competition and second at the Cheng Wu Innovation Challenge. In May they secured an additional $20,000 investment from the Community Ideation Fund (run by Elevate Ventures through the Velocities partnership) to enable further technological improvements.
Later this summer, Deep Word will launch an API that allows companies to generate videos at scale. For example, a large company using a learning management system could pass specific information about each individual employee (name, title, duties, supervisor name, etc.) to Deep Word’s servers and receive back training videos personalized for each employee.
“Deep Word has really been able to prove out the technology with individual users,” said Cy Megnin, Elevate Ventures’ entrepreneur-in-residence serving Velocities, a partnership supporting startups in south-central Indiana. “What has me most excited about this company is the release of its API, which will allow video production to be truly scalable.”
Share your news with us!
Submit your news to the Chamber by the 12th or 28th of each month to be included in the bi-weekly Membership Matters emails.