The Bulgarian Institute for Computer Science, AI and Technology (INSAIT) said on November 19 it is releasing three AI models for Bulgarian language, which will be available for state institutions, businesses and the general public on November 23.
“These models demonstrate unprecedented performance in Bulgarian, outpacing much larger ones such as Qwen-72B and Llama3-70B, as well as similar-sized models, while retaining robust English language capabilities,” INSAIT said in a statement.
The three models are with 2.6bn, 9bn and 27bn parameters and are freely available for everyone to build AI-based assistants.
INSAIT said that its 2.6B model outperforms significantly open models of similar size in Bulgarian.
“Interestingly, beyond benchmarks, INSAIT’s 27B significantly surpasses GPT-4o-mini (free version of GPT-4) and rivals GPT-4o (paid version of GPT-4) in Bulgarian chat performance, according to GPT-4o itself, which was used as a judge across thousands of real-world conversations from around 100 different topics. The results are similar when compared to Anthropic’s Haiku and Sonnet (large) models,” the institute said.
The three models are built on top of Google’s Gemma 2 family of models but have a number of improvements, including continuous pre-training on around 100bn tokens in Bulgarian, as well as novel instruction-fine tuning and model merging scheme.
“This new Branch-and-Merge scheme ensures that models improve on a target skill, such as Bulgarian understanding and generation while avoiding catastrophic forgetting of already acquired skills in the base models. The method is widely applicable and its utility is demonstrated beyond Bulgarian,” INSAIT noted.
Building on its 27-billion-parameter model, INSAIT will launch on November 23 the first public nationwide chat system.
“The system goes beyond a single model and includes further advances including alignment, retrieval subsystems and other components. This is the first time globally that a system of this scale has been launched by a government institution,” INSAIT stated.
In the beginning of the year, the institute said it is launching BgGPT - the first Bulgarian-language open-source AI model of the latest generation.
INSAIT's BgGPT aims to be an independent and competitive AI model, which should secure technological leadership to Bulgaria, according to INSAIT. The organisation demonstrated the model's capabilities, showing its ability to answer questions, solve problems, and compose essays on AI applications. The model's continuous improvement aims to make it a leading Bulgarian-language AI resource.
The BgGPT can be downloaded by any institution on its own closed servers and used without exporting data to external servers.