How Xiaomi has quietly turned into one of the world’s most ambitious leaders in artificial intelligence, despite having long been associated exclusively with affordable smartphones and various household gadgets, is demonstrated by its rapid progress over the past year and a half. The company is expanding its operations at an extraordinary speed, creating powerful language models, voice synthesis systems, autonomous telephone agents, and investing huge financial resources in its own developments. Below is a detailed description of the tech giant’s current position and achievements in the global AI and big language modeling race.
Xiaomi officially entered the field of large-scale language models in April 2025 when the developers presented their debut open-source model called MiMo-7B. The abbreviation is quite simple and stands for Xiaomi Model. From the very beginning, the key advantage of this development was the focus on solving logical problems and writing program code rather than ordinary conversations.

Despite the relatively small volume of 7 billion parameters, MiMo-7B has demonstrated results that far exceed expectations for its class. When tested on the MATH-500 math benchmark, the reinforcement learning version of the model achieved 95.8% accuracy. Moreover, it unexpectedly outperformed such well-known developments as OpenAI’s o1-mini and Alibaba’s Qwen-32B-Preview in the AIME 2024 and 2025 math competitions. A specially selected dataset of 200 billion logical reasoning tokens was used to train this digital tool, and the total amount of information in the three stages of training amounted to 25 trillion tokens. Xiaomi released this product under a free MIT license, making it available on the Hugging Face platform, and Luo Fuli, who joined the company from the DeepSeek project, became the direct head of the engineering team.

In December 2025, the brand announced the creation of its next model, MiMo-V2-Flash, which reached a total of 309 billion parameters, but most of its internal architecture remained inactive during operation. Thanks to the use of a mixture of experts’ architecture, only about 15 billion parameters are used simultaneously.
The main difference of this new product is the balance between high performance and data processing speed. It was ranked among the top two open source solutions in logical reasoning tests and equaled GPT-5 and Claude 4.5 Sonnet in SWE-Bench Verified software engineering tests. The model is capable of issuing information at a rate of 150 tokens per second, and the cost of its inferencing is only 2.5% of the Claude rates. The company has set the price for access to its API at $0.1 per 1 million incoming tokens and has also provided free testing for a limited period after the release. MiMo-V2-Flash also integrated a multi-token prediction method that allows the system to generate and verify multiple text structures simultaneously.
In March 2026, the largest model, MiMo-V2-Pro, made its debut. This flagship product has a total capacity of more than one trillion parameters, of which 42 billion are activated during each individual pass. The system supports work with a context window of 1 million tokens, which allows it to analyze the amount of information comparable to several voluminous fiction novels in a single session. Xiaomi specialists note that the model was created specifically for agency missions that involve complex multi-step tasks with planning and implementation of actions without constant human control.

Interestingly, this tool initially appeared on the OpenRouter AI gateway platform under the anonymous name Hunter Alpha. The model instantly topped the platform’s rating and managed to process more than 1.5 trillion tokens before Xiaomi officially announced its authorship. Such spontaneous attention from the developer community was a clear confirmation of the product’s real competitiveness. Along with the flagship, the company introduced two auxiliary systems: the multimodal MiMo-V2-Omni version, which is capable of working with text, pictures, audio, and video, and MiMo-V2-TTS, which is responsible for generating speech in the structure of an autonomous agent.
At the end of April 2026, Xiaomi developers combined the best achievements of the second generation of models into a single system, creating MiMo-V2.5-Pro. This model with 1.02 trillion parameters is capable of simultaneously working with text, images, sound, and video. Its speed when performing complex tasks ranges from 60 to 80 tokens per second, while the lighter MiMo-V2.5 modification, designed for everyday needs, produces 100 to 150 tokens per second. The version with the Pro prefix also received the status of the world’s best open source model in terms of agency capabilities according to the Artificial Analysis rating at the time of its release. In addition, the company canceled the additional fee for using the maximum contextual window of one million tokens and updated the balance of user credits after the release, which increased the attractiveness of the platform for developers. More recently, in early June 2026, MiMo Code, a console AI agent for programming based on MiMo-V2.5, was launched. Unlike most other digital assistants that lose context when the working window is full, MiMo Code is equipped with a persistent memory system that allows you to maintain the logic of decisions made during long-term work on large projects.

To work with visual content, Xiaomi has prepared the MiMo-VL model and its special home modification MiMo-VL-Miloco-7B. The latter version is designed specifically for understanding the household space. The model can recognize standard hand movements, such as a raised thumb, OK gesture, peace sign, or open palm, and detect common household activities such as watching TV, exercising, or reading a book. The tool was created through a combination of guided fine-tuning and reinforcement learning, which allowed it to maintain high adaptation to the home environment without losing overall intelligence.
Launched in August 2025, MiDashengLM-7B became Xiaomi’s specialized audio processing solution. Unlike traditional voice AI systems focused on speech recognition with the loss of non-verbal data, this product uses a method of general description of audio content. The model was trained on the basis of a huge set of recordings lasting 38662 hours, so it can distinguish not only words but also musical pieces, ambient noise, the speaker’s emotional state, and acoustic conditions. This development is based on Alibaba’s Qwen2.5-Omni-7B model and is currently being integrated into electric vehicles and various smart home appliances manufactured by Xiaomi. The product was released under the Apache 2.0 license, which opens up wide opportunities for its commercial application.

Along with the work on visual and speech technologies, the brand has released a separate audio language model called MiMo-Audio. Subsequently, its audio encoder was integrated into the MiMo-V2.5 structure to provide full multimodal interaction.
One of the most notable recent achievements was the release of OmniVoice in May 2026, an open-source speech synthesis model from the next-generation Kaldi team at Xiaomi’s AI lab. This system supports 646 languages, including rare dialects with a minimum amount of available training materials. The product operates on the principle of zero-shot voice cloning, meaning it can accurately copy human intonations based on an audio example lasting only a few seconds and generate natural sound in another language with full preservation of individual voice features. From a technical point of view, OmniVoice stands out for its simplified architecture with a single transformer that transforms text directly into acoustic tokens. This made it possible to complete a training cycle based on 100,000 hours of audio recordings in just one day, and the data processing speed using PyTorch is 40 times faster than real time. Xiaomi representatives note that OmniVoice is the first voice copying system for speech synthesis that covers hundreds of languages. It also contains useful tools for correcting difficult pronunciation cases, such as multi-valued Chinese characters or rare English proper names. All developments are distributed under the free Apache-2.0 license.
Along with the large-scale update of the 2.5 line, Xiaomi introduced the MiMo-V2.5-TTS system and the ASR tool for automatic speech recognition. The synthesis model supports the voice copying function, and the recognition module successfully copes with the bilingual information flow. Together, these solutions enable developers to create ready-made voice products without the need to combine tools from third-party vendors.

For ordinary consumers, the company offers two main options for interacting with artificial intelligence technologies. The first one is the Xiao AI voice assistant that has been working on smartphones, smart speakers, and portable gadgets for a long time. With the release of HyperOS 2, it was upgraded to the Super Xiao AI version, adding better context retention, more efficient home appliance control, and the ability to create drawings based on text queries. This assistant is deeply integrated into the three-part HyperOS structure, which consists of the HyperCore performance platform, the HyperConnect device synchronization system, and the HyperAI intelligence module. The second element is the HyperAI feature package, which was presented internationally at MWC 2025 and launched on phones starting with the Xiaomi 15 line. It offers users simultaneous translation, texting assistance, intelligent audio processing and summarization, and smart photo editing. On devices for the global market, the company additionally used the Google Gemini service as the underlying infrastructure. Subsequently, the HyperAI package became available for mid-range devices, including the Redmi Note 14 Pro+ 5G and Poco series.
The most progressive element in Xiaomi’s development structure is the miclaw project, announced in March 2026, which is currently undergoing closed beta testing. This tool is not an ordinary chatbot but an autonomous AI agent based on the MiMo architecture. Instead of simply answering questions, miclaw recognizes user intentions and puts them into practice on its own. It is able to launch applications, navigate between interfaces, fill out electronic forms, interact with system utilities, and perform complex processes on a smartphone without the need for step-by-step control. The work is based on a scheme that the company calls the logical inference and execution cycle: AI determines the necessary action, performs it, analyzes the result, and continues the process until the task is completed. The product also has a contextual memory that compresses old interaction sessions but clearly retains the original purpose of the operation. In addition, it can connect to the brand’s broader smart home and car infrastructure.
Regarding privacy, the company states that communication data with miclaw is not used to train algorithms. Personal information is processed exclusively in real time to execute current commands, and sensitive data is handled directly on the device using edge-cloud private computing technology. The current closed testing phase is compatible with Xiaomi 17 series phones. According to the manufacturer, the next version of HyperOS 4 will have full integration of miclaw at the kernel level. This agent has also been tested as an assistant for smartwatches via the Xiaomi Health app. Users hold down the button to enter a voice command, after which the request is processed on a connected smartphone, and the finished result is displayed on the watch screen.

In March 2026, Lei Jun announced the company’s intention to allocate at least $8.7 billion for the development of AI technologies over the next three years. These funds will complement the already growing investments in research and development. As a result, Xiaomi’s annual R&D expenditures are expected to reach about $5.7 billion in 2026. The results of these financial investments are already visible. By the beginning of April 2026, the company’s models generated approximately 21% of all network traffic on the OpenRouter platform. Lei Jun also noted that in 2026, the company aims to achieve global unification by combining its own processor, proprietary operating system, and unique artificial intelligence model within a single consumer device.
A year ago, Xiaomi did not have any publicly available AI models. Today, the brand owns a complete technology stack: logic models, computer vision systems, audio models, voice cloning tools, speech synthesis and recognition infrastructure, an autonomous AI agent, and consumer functions that are integrated into millions of gadgets. The speed at which the company is creating and releasing these products is amazing to say the least. At the same time, the open source nature of most of these developments is helping the company to rapidly gain the loyalty of the developer community. The main test for the future will be the ability of miclaw and HyperOS 4 to make all these intelligent tools truly useful in the everyday lives of ordinary people. If the company succeeds, Xiaomi will cease to be just a phone manufacturer that simultaneously develops AI technologies and will turn into a full-fledged AI platform on a global scale.
Read also:
- American company makes history with the first flight of an electric aircraft powered by solid-state batteries
- A verdict from the market leader: Anker CEO explains why the era of external batteries is coming to an end
