Microsoft has spent the last two years adding new productivity features to Teams, and now the company is rethinking the basics with the help of artificial intelligence (AI). We’ve all been on a meeting where someone had bad acoustics in the room, making it difficult to hear them, or saw two people trying to speak at the same time, creating an awkward moment of “no, you go ahead.” New voice enhancements based on Microsoft’s artificial intelligence should improve or even eliminate these everyday inconveniences.
Microsoft uses machine learning models to improve room acoustics so that you no longer sound like you’re hiding in a cave. “While we have been trying our best with digital signal processing to do a really good job in Teams, we have now started using machine learning for the first time to build echo cancellation where you can truly reduce echo from all the different devices,” explains Robert Aichner, a principal program manager for intelligent conversation and communications cloud at Microsoft.
Microsoft has been testing this for months in the real world to make sure Teams users notice a decrease in echo and improved communication quality. The software maker has used 30,000 hours of speech to teach its models and captured thousands of devices through crowdsourcing, where Teams users were paid to record their voice and play sound from their device.
If Teams detects that the sound is bouncing or reverberating in the room, resulting in a shallow sound, the model will convert the captured sound and process it to sound as if Teams participants were speaking to a close-range microphone.
The most impressive part is the ability for people to interrupt each other during Teams calls, without the inconvenience of overlapping when you can’t hear the other person through the echo. Microsoft is now implementing all of this in Teams, along with improvements previously made to AI-based noise reduction. All processing is performed locally on client devices, not in the cloud.
All of these new Microsoft Teams enhancements have already been launched, along with some real-time screen optimizations for text-to-video and AI-based enhancements to limit bandwidth during video or screen-sharing calls.