Google has just released its most powerful AI model named Gemini. What is it and what can it do? Find out everything in our article.
Google has been developing its own artificial intelligence tools for many years. With the public release of OpenAI’s ChatGPT-3, the company that manages the world’s most popular search engine, Google also created a similar tool called Bard. However, this tool was still behind its competitors, which was unacceptable for Google. Therefore, it’s not surprising that the world has recently seen Gemini, a new language model from Google, as we learned from the presentation.
It is designed not only to enhance the previous Large Language Model (LLM) giant but also to offer entirely new capabilities in text, graphics, and audio processing.
So, let’s take a closer look at the new language model from Google, Gemini.
Read also: OpenAI’s Project Q: Breakthrough or Looming Threat?
This is a new and powerful artificial intelligence model from Google that can understand not only text but also images, videos, and audio. Described as a multimodal model, it is said to be capable of tackling complex tasks in mathematics, physics, and other fields. Additionally, it can comprehend and generate high-quality code in various programming languages.
Currently, Gemini is accessible through integration with Google Bard and Google Pixel 8 and will gradually be incorporated into other Google services.
“Gemini is the result of extensive collaborative efforts across Google teams, including our colleagues from Google Research,” said Dennis Hassabis, CEO and co-founder of Google DeepMind. “It was built from the ground up to be multimodal, meaning it can generalize and seamlessly comprehend, manipulate, and combine different types of information, including text, code, audio, images, and video.”
Read alsо: Microsoft Copilot: Game-Changer or False Path?
You may have already guessed that Gemini was created by Google and Alphabet, Google’s parent company, and is the company’s most advanced AI model to date. Google’s DeepMind division has also made a significant contribution to its development. It is not yet known how many employees took part in the development and what funds were allocated for it, but knowing Google’s capabilities, we can be sure that it is a very tidy sum.
Read also: Windows 12: What will be the new OS
As mentioned earlier, it is a multimodal model, meaning it can comprehend, manipulate, and combine different types of data, including text, code, audio, images, and video. This provides enhanced capabilities for understanding, reasoning, and coding compared to previous AI systems.
The main capabilities of the model include:
We see that its capabilities surpass other models.
In language understanding, mathematical thinking, and coding tests, Gemini Ultra outperformed models like GPT-4. Specifically, it is the first model to exceed human-level performance on the Massive Multitask Language Understanding (MMLU) test, achieving over 90% accuracy.
In 32 academic tests, the research on the large language model Gemini yielded better results than GPT-4. In 30 cases, Google’s new language model outperformed its competitor, showcasing the model’s capabilities in comprehensive language understanding.
Read also: Human Brain Project: Attempt to imitate the human brain
Google describes this AI as a flexible model capable of operating on any device, from Google’s data centers to mobile devices. To achieve this scalability, Gemini is released in three versions: Nano, Pro, and Ultra.
Let’s delve into the different versions of Gemini.
Read also: Google Bard AI: everything you need to know
Google has made the AI available in the Nano and Pro versions in its products, such as the Google Pixel 8 smartphones and the Bard chatbot, respectively. Google plans to integrate it over time into its search engine, advertising, Gmail email service, Chrome browser, and other services.
Developers and corporate clients will be able to access Pro through the Gemini API on Google AI Studio and Google Cloud Vertex AI, starting from December 13, 2023. Android developers will have access to the Nano version of the model through AICore, available in an early preview version.
Read also: 7 coolest ways to use ChatGPT
According to the information provided during the presentation, the model will enable Google Bard to better tackle more complex problems, as stated:
“We designed Gemini to be naturally multimodal and pretrained on diverse modalities from the very beginning. We then fine-tuned it by adding more multimodal data to enhance performance. Thanks to this, Gemini understands and draws conclusions much better than previous multimodal models, boasting state-of-the-art capabilities in almost all domains.
The extraordinary multimodal capabilities of Gemini assist in comprehending complex textual and visual information. They prove particularly valuable when it comes to extracting specific information from vast datasets. The remarkable ability of this model to distill the essence of information from hundreds of thousands of documents by reading, filtering, and analyzing them will undoubtedly contribute to groundbreaking discoveries across various fields, from science to finance.”
During the presentation, an example of a comprehensive study was showcased, containing over 200,000 records, some of which needed to be updated based on new data. As one might imagine, performing this task manually would have been extremely time-consuming, so the researchers applied Gemini to generate code that utilized input data and incorporated the necessary updates. Another, more practical example is the use of Google’s language model to explain tasks related to mathematics or physics.
The input data in this case consisted of photos/scans of a school lesson assignment. The system successfully processed the stored graphics and text, then indicated which parts of the homework were done correctly and which needed additional attention. The presentation demonstrated that users could repeatedly ask Gemini to explain the task, and each subsequent attempt required simpler language for explanation. The precision and accuracy of Gemini, of course, will be verified by those interested, but the ability to read and process text directly from photos is impressive. As mentioned during the presentation:
“Gemini has been trained to recognize and understand text, images, sounds, and much more simultaneously. This enables it to better grasp the nuances of information and respond to complex queries. It is particularly effective in explaining topics related to mathematics and physics, making it a valuable personal guide for homework assignments.”
“Google also boasted that Gemini was ‘trained’ on new TPUv5 chipsets and plans to introduce Gemini Ultra in early 2024, which will utilize Bard Advanced, a new version of the giant’s consumer-grade language model. Currently, Gemini Ultra is in the testing phase and is already available to select security experts.”
The latest important information is the integration of Gemini into Google Pixel 8 smartphones. This will allow, among other things, the creation of quick responses through the Gboard app in messengers. The first one to benefit is WhatsApp, but next year, such solutions will appear in other communication-related applications. However, this is just the beginning, as Google has announced many new AI tools for Pixel 8 smartphones, and in the future, they will be available on some other Android devices. However, these are future plans, and no details have been provided at the moment.
The new Gemini model from Google seems to be one of the largest and most sophisticated artificial intelligence models to date, although the release of the Ultra model will definitely determine this. Compared to other popular models currently used in AI chatbots, Gemini stands out with its unique multimodal feature, while other models like GPT-4 rely on plugins and integration to truly be multimodal.
While Gemini represents a significant leap in the development of artificial intelligence capabilities, it has its drawbacks, common to other large language models. Firstly, there is the potential for generating misinformation. Biases are also rooted in the training data available to the new language model. It’s also worth mentioning the limited understanding of the real world. Google acknowledges that the new Gemini model may make mistakes, provide facts not based on evidence, and contradict common sense.
Additional tests are needed, especially for Gemini Ultra, which has new capabilities that are not fully explored yet. Google aims to thoroughly assess Gemini to minimize potential harm.
Read alsо: The war for satellite internet: Europe challenges Starlink
The launch of Gemini by Google has initiated a new era in the development of AI. With its superior performance compared to previous models and human baseline benchmarks, Gemini points towards future possibilities of artificial intelligence. However, it still requires further research to address certain drawbacks.
In the future, we can expect Gemini to provide more useful and intelligent features in Google products. The company plans to continue expanding the capabilities of Gemini beyond English and build upon its foundational model methodology.
We can only watch and hope that Google knows what it’s doing.
Read also:
Leave a Reply