Ukrainian LLM is coming: at what stage is the development of

Last year, Ukraine set an ambitious goal to become one of the world’s top three leaders in the development of artificial intelligence. One of the key steps on this path will be the launch of a national large-scale language model created and trained on unique Ukrainian data.

The development of a large-scale language model requires significant resources, and in a full-scale war, public funds are primarily directed to defense. That is why Kyivstar became the technical partner of the initiative. The company will finance the creation of a national LLM, after which the model will be transferred to the state.

The Ukrainian LLM will be based on the Gemma family of models from Google, a strategic partner of the project. These are modern technological solutions that are adapted to the Ukrainian language, realities, and specifics of use. The quality and reliability of the future language model will be checked by an independent group of experts from various fields. They will test the system in terms of technical excellence, ethics and safety for users, the depth of knowledge of the Ukrainian language, and the ability to work correctly with the national context and sensitive topics.

Currently, the team is focused on the most critical stage – collecting data for the LLM course. To achieve high quality, ordinary open sources from the Internet are not enough, so government agencies, media, universities, and other institutions are involved in the process. As a result, an array of terabytes of unique data is formed, which will become the basis for training the model.

Along with the technical part, the developers are building a legal framework that regulates responsible work with the data transferred for training. This is necessary to protect the rights, transparency of processes, and trust from partners and society. A separate area of work is the creation of our own benchmarks. An expert team is already working on them, developing tests that will allow us to objectively assess the quality, efficiency, and security of the language model and systematically improve it.

In January this year, the team plans to have the first database of texts for training ready, an improved tokenizer that will speed up and improve the accuracy of Ukrainian language processing, as well as its own tools for assessing the quality of the model. The next step will be the launch of beta testing, scheduled for spring 2026.

The name for the Ukrainian big language model will be chosen by citizens themselves. In January, voting will start in the Diia app, allowing everyone to contribute to the creation of the national AI not only as a user but also as a co-author.

Read also:

Sourcethedigital

Subscribe

0 Comments

Newest

OldestMost Voted

Ukrainian LLM is coming: what stage is the development of the language model at?

New comments