Synthetic voice industry wants as much distance from deepfakes as possible
The synthetic voice industry sees how nefarious deepfake creators have been able to define the market for realistic video animations of people — and everyone wants to avoid the same fate.
For many people familiar with deepfakes, the key market comprises real porn video digitally altered to show celebrities acting out lessons of the birds and the bees.
It might have meant the hypothetical face-scanning of deceased mafia-busting U.S. Sen. Robert F. Kennedy into the movie The Godfather, but the legitimate industry was outflanked.
Today, when people could be debating synthetic performances, artistic merit, copyright or the uncanny valley, most must first contend with a lasting “ick” stain on an already fraught concept.
Can the synthetic voice community do better? Maybe.
So far there is one fairly well-known case of a cloned voice being used to defraud a company in the United States. (Plenty of more minor cases of voice-enabled fraud occur, but they are under the radar for legislators, regulators and the public.)
At least one synthetic voice vendor is pushing a typical technology wonder narrative, no doubt hoping others will join in the campaign before a hypothetical egregious misuse of AI voice capabilities can make this sector, too, seem disreputable.
The message is clear. “Deepfake” need not be attached to synthetic voice. Do so in conversation and Gacek will politely but flatly explain how one has little to do with the other.
Fair point, but Veritone has its work cut out for it on this regard. The company is using U.S. legislation — the Deepfake Task Force Act — as a backdrop to give their point some context.
Gacek’s pitch is that the threat of deepfake crimes and misinformation should not overshadow the multiple less-sexy roles that synthetic voice can deliver for public and private organizations.
For businesses, the technology offers a way to craft marketing messages globally.
A template message can be automatically translated for an unlimited number of markets, with local nuances and idioms. That voice could be a clone of a worldwide celebrity or an utterly manufactured algorithm.
The technology has advanced to the point of adding breaths and pauses to speech, though systems only recently really began to really accelerate. Most synthetic speeches lose authenticity when they run long.
Podcasts, which can be marketing, but more often are entertainment, are a major target for the industry, he said.
Gacek pointed out that governments that today issue written warnings, statements and proclamations in locally dominant second languages could do the same in audio for every dialect spoken.
He is not naïve. He knows that synthetic voice fraud will increase regardless of how ethical and useful legitimate markets are.
The trick will be revving up the industry with trustworthy and competent algorithms that deliver on marketing promises and making sure there is maximum sunlight between legitimate deployments and criminal intents.