Advances in AI, MEMS usher in the Internet of Voice era

As hardware and software become better, voice as an interface with various everyday devices will offer a more inclusive and customized experience to the user. (simpson33/istock/Getty Images Plus)

When it comes to sensor technologies for sound and voice control, innovation is happening on all fronts—from the audio device itself to the software and algorithms and big improvements in MEMS microphones themselves. AI-driven data analysis  

FierceElectronics recently spoke with Dimitrios Damianos, Technology & Market Analyst and Custom Project Business Developer at Yole Développement (Yole)  about this trend and how it will eventually lead to acoustic event detection, voice recognition and context awareness—even futuristic applications like emotion/empathy sensing using voice (Amazon and Apple already have patents on it.)

As voice control gains traction, design engineers will need to consider the unique requirements and issues surrounding the technology.

FierceElectronics (FE): You’ve said that the next innovation in MEMS and sensors will be in audio, for sound and voice control. But isn’t it already here? What will be different?

Free Daily Newsletter

Interesting read? Subscribe to FierceElectronics!

The electronics industry remains in flux as constant innovation fuels market trends. FierceElectronics subscribers rely on our suite of newsletters as their must-read source for the latest news, developments and predictions impacting their world. Sign up today to get electronics news and updates delivered to your inbox and read on the go.

Dimitrios Damianos (DD): Yes, it is here. MEMS mics have been used since 2003 when they were included in the first Motorola Razr phone. Since then they have come a long way: they have displaced traditional electret condenser microphones (ECM) offering better performance, sensitivity and lower cost, and ship in the billions of units annually.

Since a few years ago, voice control as a human-machine interface (HMI) has been making waves. There are now numerous devices that include a Voice/Virtual Personal Assistant (VPA), such as smartphones, smartwatches and, lately, smart speakers and cars. The innovation in audio is actually happening on a bigger, more holistic scale: top-notch performance (sensitivity) is needed from the MEMS mics, as well as low power consumption, because these are used in devices that are always-on. Also, high-quality sound must be captured to allow for efficient processing and high-quality rendering. You know the concept in computer science: garbage-in, garbage-out, meaning if you want to get some context from the data, it must at least be of a certain quality. That is why MEMS mics keep improving.

And on the system level you also need to think about the whole audio chain from the device to the audio codec, the audio software and algorithms (noise cancelling, beamforming, etc.) and the digital signal processor (DSP) and finally the audio amplifiers and the loudspeakers.  So, innovation is happening on all fronts, in the optimization of all these variables but particularly in the analysis of the data using AI which will eventually lead to acoustic event detection, voice recognition and context awareness.

FE: What are some of the technology advancements that will accelerate adoption and open up new applications and what role does the edge play?

DD: Besides some new technologies for MEMS mics (piezoelectric, optical) and MEMS microspeakers, the adoption of voice as an HMI has accelerated mainly because of advances on the AI computing front. Now, most computing is done in the cloud, where the models are trained, and where inference is also performed. This allows for analysis of the data, which has enormous value.

However, the data in this case is typically in the hands of the GAFAMs (Google, Apple, Facebook, Amazon, and Microsoft) of the world, which sometimes raises privacy issues. We are seeing a shift toward training in the cloud and inference at the edge to reduce latency issues. Eventually both training and inference will be done at the edge to address privacy concerns. In this case, everything is done locally on the device and no data is sent to the cloud. For all the training to be done in small form factors, close to the device (at the edge) and at a low-enough power, machine algorithms are being rethought and new computing architectures are being investigated, such as neural networks.

FE: What about cost – will that need to come down substantially to realize the market size that you forecast?

DD: There is no problem with cost. MEMS microphones are produced in the billion-unit range yearly and have a very low price, typically in the range of 0.1-0.3$ depending on the manufacturer and order size. The particular market size that we forecast for MEMS mics will be realized in two ways: by the increasing attachment rate of MEMS mics in various consumer devices and by the growth of end-system volumes, integrating MEMS mics. At the end, the adoption of voice as an HMI will depend on the total offering: the cost, performance, and functionalities of the whole system, including the MEMS mics, speakers, audio processor or computing chip, etc.

FE: What are some of the futuristic applications coming down the line?

DD: We are heading toward an Internet of Voice (IoV) era with increasing adoption of voice as an interface with various everyday devices. So really, the future is here, and it will just keep getting better as hardware and software becomes better, offering a more inclusive and customized experience to the user. In that way, as various latency, power consumption, computing and privacy issues start to clear, more and more people will use VPAs in everyday life, with all kinds of devices.

In a Dystopian-kind of sense, one futuristic application would be emotion/empathy sensing using voice (and sometimes other sensor) data. From the tone of your voice, your mood could be deduced. Amazon and Apple have already patents on it. Amazon has also a new wearable device, the Amazon Halo wristband, which analyzes your tone of voice, so things could move in that direction in the future.

FE: Especially given the growing population of seniors, when can we expect to see good hearing aids based on MEMS mics?

DD: Each hearing aid manufacturer has different requirements and wants a specific microphone developed, making this a low volume market with high requirements (needing high quality mics), which in turn leads to high microphone ASPs. Given these constraints it might not seem a very lucrative market for various microphone manufacturers.

Nevertheless, MEMS microphones are increasingly adopted in hearing aids, despite the fact that traditional ECMs still remain the most-used microphone for this application. MEMS mics’ small size has long been a key advantage, but now they also perform similarly to or better than ECMs in terms of noise performance, power consumption, stability, and repeatability in hearing aids. MEMS mics enable new functionalities such as more precision in directional hearing, speech recognition, and amplification which could lead eventually to better hearing aids.

Editor’s Note: Yole Development’s Dimitrios Damianos, will be speaking on the pervasiveness of MEMS sensors in everyday life and a glimpse into the human-machine voice interface at Sensors Innovation Week, a digital event series taking place November 16-18, 2020. For more information and to register for your free pass, click here.

Related:

Voice user interfaces: Getting more popular, but still technically challenging

Alexa voice coming to light switches through AWS cloud

Suggested Articles

In June, Su stood out for heartfelt commentary on social injustice after the killing of George Floyd. She challenges industry to do more.

New chip will be incorporated in coffee mug-sized device to cost $199 next year

In a nutshell, Edge ML is a technique by which smart devices process data locally using machine and deep learning algorithms.