April 6, 2015 -
Voice recognition requires incredible amounts of computing resources. Digital assistants, including Siri from Apple, Google Now, Microsoft’s Cortana, and Amazon’s Echo, that run on mobile phone platforms have enormous data requirements.
Instead of leveraging a phone’s limited processing power, these digital assistants utilize servers in cloud-based environments to interpret user requests and to provide requested information to users. These digital assistants therefore operate as “artificial intelligence engines”, which harness the ability to correlate millions of requests in order to provide more accurate and sound responses.
Because artificial intelligence harnesses reasoning and logic, knowledge, and learning processes through the use of natural language processing and communication, a tremendous amount of data transmission and data processing is required.
According to an article published by Ars Technica, if you use Siri two or three times per day at an average of 63KB per instance, you might expect to use 126KB to 189KB per day, or 3.7 to 5.5MB per month. If you use the personal assistant between four and six times a day, that might work out to 252KB to 378KB per day, or 7.4 to 11MB per month. If you use it 10-15 times per day, you might end up using 630KB to 945KB per day, or 18.5 to 27.7MB per month.
Once this data is transferred by millions of users a day to the Apple cloud environment, it has been estimated by Jason Mars, an Assistant Professor of Computer Science and Engineering at the University of Michigan, that Siri requires about 168 times more processing capacity, space, and electrical power than a text-based search engine such as Google Search.
Specifically, in a research paper co-authored by Mars: “the computational resources required for a single query is in excess of 100 times more than that of traditional web search.”
Because Apple’s system is proprietary, the computing resource deployment utilized for Siri is not public knowledge. But industry analysts believe that its voice control system is leveraging either the faster processing power of graphics processing unit (GPU) chips originally built for processing complex digital images, or field programmable array (FPGA) chips that can be programmed for specific tasks. A recent Wired article confirmed that Google uses FPGA for its neural network driven voice-based search offering.
Mars determined that faster processing power is needed for Internet-based voice control through the development of his own open-source digital assistant system that mimics Siri’s functionality. Entitled “Sirius”, Mars helped to create an open end-to-end standalone speech and vision based intelligent personal assistant service that is similar to digital assistant offerings provided by Apple, Google, Microsoft and Amazon.
Sirius implements all core functionalities of a digital personal assistant, including speech recognition, image matching, natural language processing and a question-and-answer system. Through its development at the Clarity Lab at the University of Michigan, academics are now able to conduct independent research to determine and evaluate the data processing requirements of voice control systems. According to Mars, “Sirius is to Siri, as Linux is to Windows”.
This is due to the fact that Sirius is free and can be customized. “Now that the core technology is out of the bag, we all have access to it,” says Mars. “Instead of making an app to run on the Apple Watch, for example, maybe I could make my own watch. We’re very excited to see what the world comes together to build and learn with Sirius as a starting point.”
Mars sees Sirius as an important platform for research into the development of next-generation warehouse computing. It gives researchers a test bed for studying how the data centers that process voice-enabled queries should evolve to keep up with escalating pressure from wearable gadgets. The expectation that voice control will become even more prevalent due to increasing proliferation of mobile devices and wearables will mean increased data processing requirements that will reshape the Internet as we know it.
Last year, Biometrics Research Group, Inc. estimated that the speech recognition software market was valued at US$11.5 billion in 2010 and will reach US$20.1 billion in 2015. We can expect more revenue growth in the sector as voice control becomes more ubiquitous.