e-Why, What & How · 2020-04-14

Using AI to eliminate background noise from video chat – e-Why, What, & How


Video conferencing is a thing right now and it’s sudden exponential growth owes much to the scourge of Covid-19. Remote workers often need to connect – talk plans through, discuss problems, educate or illuminate, whatever the reason many are using video chat apps and worker specific apps, such as Slack, Microsoft’s Teams, Zoom or Google Hangouts to get the job done. However, video chat isn’t perfect, especially when it comes to the distractions that unexpected or uncontrollable background noises can cause.

Much has been done by audio experts to overcome the issue of background irritations during live video calls, but many of the elimination techniques only manage to control easily identifiable background noises, which do not happen unexpectedly, such as the steady drone of an air-conditioner or a percolating coffee machine.

It’s the single instance, unexpected noises that need addressing, such as the rustling of a crisp packet, a dog barking or a door slamming – these are much harder to control or eradicate from a “live” feed.

Here’s how tech companies like Microsoft & Google are trying to remove background noise in video conferences:

Microsoft Teams

For Microsoft, addressing the issue of extraneous noise has become a focus in the effort to provide the best video chat experience available on the market. Developers have engaged advances in AI and Machine Learning to improve their noise detection algorithms, but the path has not been an easy one.

To train an AI model thousands of sample audio clips need to be used, which have been accurately labeled for extraneous noise. Moreover, because these samples are hard to come by, because of privacy concerns, developers have resorted to simulating conversations and injecting “known” noise samples into the audio files, so that they can train the computer to “hear” and eliminate unnecessary noise, such as a cat meowing or keyboard tapping.

One of the methods used to achieve this aim was to record thousands of hours of artificial voice readings taken from a variety of novels. These voice simulations were then overlaid using various “known” noise samples. The computer was then fed 2 distinct audio samples to “listen” to. By comparing the “ground truth” sample to the noisy sample and being able to “identify” the type of sound that needed to be eliminated, AI has been “trained” to remove thousands of sounds, thereby offering a much improved audio experience to consumers.

Processing

The issue of processing audio in real time has also been really difficult for background noise elimination developer teams. Noise elimination algorithms are “heavy” so the ideal way to employ them is via the Cloud. However, this causes certain latency issues, so on device processing is the preferred method. Moreover, CPU and battery usage elements need consideration – no one wants perfectly clear audio at the cost of inefficient battery life. Microsoft has various trade off options to manage these issues, and promises that neither battery or storage facilities on-device will be compromised.

Open source

Microsoft has released all its data & methodology on GitHub for use by the general developer community, which is an excellent move if strides are to be made on this front.

Google’s FUSS

Google has followed much the same path as that outlined above to introduce FUSS (Free Universal Sound Separation). To achieve its aims Google has used Creative Commons licensed audio clips from freesound.org & separated them into 3 distinct categories for training purposes — 20,000 training mixtures, 1,000 validation mixtures, & 1,000 eval mixtures. They’ve also developed, to quote, “…our own room simulator implemented in tensorflow, which generates the impulse response of a box shaped room with frequency-dependent reflective properties given a sound source location and a mic location.”

Google has now released this data in the hopes that it will aid others in quickly solving audio noise-compromising deterioration.

Image by Gerd Altmann from Pixabay 


Click here to opt-out of Google Analytics