Apple researchers have developed an artificial intelligence system called ReALM (Reference Resolution as Language Modelling), which aims to radically improve the way voice assistants understand and respond to commands.
In a research paper, Apple describes a new system for how large language models solve the problem of reference resolution, which includes deciphering ambiguous references to on-screen objects, as well as understanding conversational and background context. As a result, ReALM can lead to a more intuitive and natural interaction with devices.
Reference recognition is an important part of understanding natural language, allowing users to use pronouns and other indirect references in conversation without confusion. For digital assistants, this ability has historically been a significant challenge, limited by the need to interpret a wide range of verbal signals and visual information. Apple’s ReALM system aims to solve this problem by turning the complex process of link recognition into a pure language modelling task. In this way, she can understand the references to the visual elements displayed on the screen and integrate this understanding into the conversational flow.
ReALM reconstructs the visual structure of the screen using text representations. This includes parsing screen objects and arranging them to create a text format that reflects the content and structure of the screen. Apple researchers found that this strategy, combined with special tuning of language models for benchmark resolution problems, significantly outperforms traditional methods, including OpenAI’s GPT-4 capabilities.
ReALM can enable users to interact much more effectively with digital assistants based on what is currently displayed on the screen, without the need for precise, detailed instructions. This could make voice assistants much more useful in various situations, such as helping drivers navigate infotainment systems while driving or assisting users with disabilities by providing easier and more accurate means of indirect interaction.
Apple has already published several research papers in the field of AI. Last month, the company unveiled a new method for learning large language models that seamlessly integrates both textual and visual information. Apple is expected to unveil several AI features at WWDC in June.
Read also:
- Apple Photos app might soon be removable on iPhones in the EU
- Apple WWDC 24 event is to start on June 10 bringing iOS 18 preview