Root NationNewsIT NewsUkrainian LLM is coming: what stage is the development of the language model at?

Ukrainian LLM is coming: what stage is the development of the language model at?

Ukrainian LLM

© ROOT-NATION.com - Use of content is permitted with a backlink.

Last year, Ukraine set an ambitious goal to become one of the world’s top three leaders in the development of artificial intelligence. One of the key steps on this path will be the launch of a national large-scale language model created and trained on unique Ukrainian data.

The development of a large-scale language model requires significant resources, and in a full-scale war, public funds are primarily directed to defense. That is why Kyivstar became the technical partner of the initiative. The company will finance the creation of a national LLM, after which the model will be transferred to the state.

Ukrainian LLM

The Ukrainian LLM will be based on the Gemma family of models from Google, a strategic partner of the project. These are modern technological solutions that are adapted to the Ukrainian language, realities, and specifics of use. The quality and reliability of the future language model will be checked by an independent group of experts from various fields. They will test the system in terms of technical excellence, ethics and safety for users, the depth of knowledge of the Ukrainian language, and the ability to work correctly with the national context and sensitive topics.

Read also: Diia AI LLM: National Breakthrough or Overinflated Expectations? 

Currently, the team is focused on the most critical stage – collecting data for the LLM course. To achieve high quality, ordinary open sources from the Internet are not enough, so government agencies, media, universities, and other institutions are involved in the process. As a result, an array of terabytes of unique data is formed, which will become the basis for training the model.

Artificial Intelligence

Along with the technical part, the developers are building a legal framework that regulates responsible work with the data transferred for training. This is necessary to protect the rights, transparency of processes, and trust from partners and society. A separate area of work is the creation of our own benchmarks. An expert team is already working on them, developing tests that will allow us to objectively assess the quality, efficiency, and security of the language model and systematically improve it.

Read also: Ukraine to Establish Its Own Satellite Communications: First UASAT LEO Satellite Scheduled for Launch in 2026

In January this year, the team plans to have the first database of texts for training ready, an improved tokenizer that will speed up and improve the accuracy of Ukrainian language processing, as well as its own tools for assessing the quality of the model. The next step will be the launch of beta testing, scheduled for spring 2026.

Ukrainian LLM

The name for the Ukrainian big language model will be chosen by citizens themselves. In January, voting will start in the Diia app, allowing everyone to contribute to the creation of the national AI not only as a user but also as a co-author.

Read also:

Subscribe
Notify of
guest

0 Comments
Newest
OldestMost Voted