Opinion | Local AI models will live and die by the quality of their data sets
16 小时前
At Asean’s first ever artificial intelligence summit in Kuala Lumpur recently, YTL AI Labs, in partnership with the University of Malaya, launched a large language model (LLM) called ILMU – which translates from Malay into English as “knowledge” but is used in this instance by the developers as a Malay acronym. Billed as Malaysia’s first, fully sovereign AI model with data locally hosted, managed and governed, ILMU has been trained on local, contextualised data and outperforms global Malay-language benchmarks.
In fact, YTL debuted an early iteration of the model in December 2024. This latest version boasts multimodality – that is, the model can process not only text but also audio and images. It is still early days but this seems an impressive achievement given Malaysia’s ethnic and linguistic diversity.
Speech in Malaysia is not only distinctly accented by state or region but also interweaves words from Malay, English, Chinese, Tamil and other languages. Even within the country, a Kuala Lumpur native may not easily understand the Malay spoken by fellow citizens from states on the eastern coast of Peninsular Malaysia.
That ILMU has been trained on images of local delicacies and sights could also be helpful in educational or tourism cases, especially if these pictures cannot be accurately identified by other LLMs.
Today, there are increasingly sophisticated equivalents everywhere from India, Mongolia and South Korea to Thailand, Vietnam and, soon, Cambodia. By the end of this year, Chile-led Latam-GPT will debut with languages spoken in Latin America and the Caribbean, including indigenous tongues and dialectical variants. Latam-GPT will join other regional, multilingual language models like SEA-LION in Southeast Asia and UlizaLlama and Vulavula on the African continent.
...Read the fullstory
It's better on the More. News app
✅ It’s fast
✅ It’s easy to use
✅ It’s free