Microsoft’s artificial intelligence model Phi is small, inexpensive, and free from “hallucinations.” Here’s what they’re saying about the new language model that is predicted to have a bright future.
GPT is absolutely remarkable, but at the same time, it’s terribly expensive, and besides, it can’t be perfect for everyone. For these and many other reasons, Microsoft is experimenting with much smaller artificial intelligence models. It is said that Phi-3-mini could even outshine the work of OpenAI engineers.
Read alsо: What is HDR in monitors and how to properly adjust it
ChatGPT is not a panacea
ChatGPT is an invention funded, curated, and improved by Microsoft. Actually, it doesn’t belong to Microsoft, but to the company OpenAI, of which Microsoft is a leading, although not the largest, investor. The language model GPT has given Microsoft a huge advantage over other major technology corporations, which are now rushing to catch up. However, there are many problems with GPT, many of which are currently impossible to solve.
First and foremost, it’s a very resource-intensive language model. Web-oriented platforms like Microsoft Copilot or OpenAI’s ChatGPT incur very high operational costs for Microsoft. This is a characteristic not only of GPT but also of all large language models. Additionally, GPT, like its competitors, has a tendency for “hallucinations,” meaning it can generate responses to queries that contain false or inaccurate information. The more data such a model absorbs, the more it is prone to generating such content. Therefore, hallucinations and false claims are not a myth pulled out of thin air. Users often note that large language models frequently make mistakes, provide inaccurate data, and operate on non-existent facts.
Both problems are very serious, which is why OpenAI, Microsoft, Meta, Google, and others are working on developing not only Large Language Model technology but also Small Language Models, which in practice can provide much better results.
The digital assistant for an accountant doesn’t necessarily need to know a lot about quantum physics. It can be much smaller and less complex (and therefore cheaper), and by training only on the data needed for its purpose, theoretically, it should hallucinate less. Although easier said than done. GenAI technology still remains the “wild west” of IT. And while progress is advancing at an unprecedented pace, making breakthroughs on fundamental issues remains challenging. But Microsoft recently announced such a breakthrough: the Microsoft Phi small language model.
Read alsо: Transistors of the Future: New Era of Chips Awaits Us
What is known about Microsoft Phi
First and foremost, it should be noted that the experiment was conducted without the involvement of OpenAI. In other words, this is a development by Microsoft engineers.
The Microsoft Phi models are a series of Small Language Models (SLMs) that have achieved exceptional results in various tests. The first model, Phi-1, had 1.3 billion parameters and achieved the best results in Python encoding among existing SLMs.
Then, the developers focused on language understanding and reasoning, creating the Phi-1.5 model, which also had 1.3 billion parameters and demonstrated performance comparable to models five times larger in terms of parameters.
Phi-2 is a model with 2.7 billion parameters, demonstrating outstanding abilities in reasoning and language understanding, performing at the level of top baseline models with 13 billion parameters. It stands out among other models due to its innovations in model scaling and data curation training.
Phi-2 is available in the Azure AI Studio model catalog, which promotes research and development in the field of language models. It was released in December 2023. Developers assure that it performs as well as Mistral or llama 2 from Meta. And Phi-3 performs even better than the previous version.
However, the Phi-3 model, which has just been announced, is qualitatively new. At least, this can be judged from the information provided by Microsoft. According to the company’s assurances, according to the indicators of all known benchmarks, Phi-3 performs better than any other model of similar size, including in language analysis, programming, or mathematical work.
Phi-3-mini, the smallest version of this model, has just become available to all interested parties. That is, it has been available since April 23. Phi-3-mini has 3.8 billion parameters and, according to Microsoft’s measurements, is twice as efficient as any other model of the same size. It can be found in the AI model catalog of the Microsoft Azure cloud service, the Hugging Face machine learning model platform, and the Ollama framework for running models on a local computer.
According to Microsoft, Phi-3-mini doesn’t require powerful Nvidia chips. The model can run on regular computer chips or even fit on a phone that’s not connected to the internet.
The lower power also means that the models won’t be as precise. Phi-3 wouldn’t be suitable for medical professionals or tax specialists, but it would assist in simpler tasks, like targeted advertising or summarizing online reviews.
Since smaller models require less processing, their use will be cheaper for private companies. This means Microsoft will have more clients interested in incorporating AI into their work but found it too costly before. However, it’s still unclear how much they will cost.
It’s not yet known when the small and medium models will be available. However, the latter will be more powerful and expensive. It’s already known that Phi-3-small will have 7 billion parameters, while Phi-3-medium will have a whopping 14 billion parameters.
Read also:
- What is Frutiger Aero and why do modern teenagers feel nostalgic about Windows Vista
- All About Starlink Direct-to-Cell Project
How do I use Phi-3-mini?
GPT-4 Turbo requires powerful AI chips, which are currently very expensive. The small language model Phi-3 can operate in standalone mode, without the cloud, even with a chip on a mobile phone.
Phi-3 is not a product for end-users but a technology that developers can use and integrate into their applications—both cloud-based, i.e., remotely hosted, and those that work locally and offline. It is intended to seamlessly operate with devices and their components, such as mobile phones, cars and their infotainment systems, or even Internet of Things sensors. In some scenarios, this technology could prove invaluable.
Microsoft even provides a specific example so we don’t have to strain our imagination. Imagine a farmer inspecting their crops and noticing signs of disease on the leaves, stems, and branches. Even when far from telecommunication towers, all they would need to do is take out their phone, snap a photo of the damage, feed it into an app using Phi-3 technology, and the model would quickly and offline analyze the photo and provide advice on how to combat this disease.
As Microsoft explains, the key to GPT’s success lay in harnessing vast amounts of data for training. With such massive datasets, high data quality is out of the question. Meanwhile, when training the Phi model, an approach entirely opposite to OpenAI’s was used. Instead of inundating the model with information, the primary focus was on gradual and meticulous training.
Instead of using raw internet data, Microsoft developers created the TinyStories dataset, generating millions of miniature “children’s” stories. These stories were used to train very small language models. Then researchers went further, creating the CodeTextbook dataset, which contained carefully curated, publicly available data filtered for educational value and content quality. These data were then filtered multiple times and fed back into the larger language model (LLM) for further synthesis.
All of this allowed for the creation of a dataset sufficient to train a more capable SLM. Additionally, during the development process of the Phi-3 model, a multi-level approach to risk management and minimization was employed, including assessment, testing, and manual correction. As a result, according to Microsoft, developers using the Phi-3 model family can leverage a set of tools available in Azure AI to create more secure and reliable applications.
Read also: Neuralink Telepathy chip: what it is and how it works
Will Microsoft Phi displace ChatGPT-type models?
Not at all. Small Language Models (SLMs), even when trained on high-quality data, have their limitations and are not designed for deep understanding of information. Large Language Models (LLMs) outperform SLMs in complex reasoning due to their size and computational power. LLMs are and will remain particularly useful in areas such as drug discovery, where searching through vast collections of scientific literature and analyzing complex patterns is necessary. On the other hand, SLMs can be used for simpler tasks such as summarizing long documents, generating content, or powering customer support chatbots.
Microsoft, according to their statements, already internally uses hybrid model sets, where LLM takes the lead role, directing specific queries that require less computational power to SLM, while it handles other, more complex queries itself. SLMs are positioned for on-device computing, without relying on the cloud. However, there will still be a gap between small language models and the level of intelligence achievable with large cloud-based models. This gap, due to the ongoing development of LLMs, is unlikely to disappear anytime soon.
Phi-3 still needs to be validated by external independent parties. Microsoft sometimes speaks of up to 25 times higher efficiency or energy efficiency in extreme cases compared to competitors, which sounds rather fantastical. However, on the other hand, it is important to remember that over the years Microsoft has somewhat conditioned us to doubt its position as a clear leader in IT innovations, and perhaps that is why we are not very inclined to believe it. AI-based programs that react instantly and work autonomously rather than generating? That would be a fitting climax to the current revolution. Unfortunately, there is one key issue.
Read also: How to use Copilot to control your Windows 11 PC
Microsoft’s Phi-3 only understands English
Phi-3 didn’t voraciously consume petabytes thrown its way. The careful and meticulous training of the model entails one minor issue. Phi-3 was trained on English-language data and currently has no knowledge of any other language. Not Ukrainian, nor German, Spanish, French, or Chinese. This significantly reduces its appeal to most users worldwide.
But Microsoft assured that work is underway to develop and enhance it. So, support for other languages will have to wait. Yet, this fact has never deterred enthusiasts and those eager to keep up with progress.
Read also: