BREAKING NEWS | PM Anwar expresses condolences to the family of Egyptian Islamic movement activist Youssef Nada who passed away today | |
For Dr William Tjhi of AI Singapore, one reason he believes Southeast Asia needs a country- and region-centric versions of ChatGPT and Gemini comes down to an anecdote involving a mosque and its volume levels.
While browsing an online forum, he came across a post from someone seeking advice on how to handle the noise from a nearby mosque.
Curious about how a generative artificial intelligence (gen-AI) large language model (LLM) chatbot would handle such a scenario in a Muslim-majority country like Indonesia, he tested several LLM chatbots by asking questions in Indonesian to assess their cultural and religious understanding.
Gen-AI relies on complex algorithms that analyse how humans structure language on the internet, while LLMs generate responses based on context gleaned from copious amounts of digital text on line.
“I asked (the gen-AI model), I’m staying in a hotel, and there’s a mosque behind it that’s a little too loud. What should I do?’” he told Bernama via Zoom.
“(O)ne (AI model) answered that … ‘Maybe talk to the mosque committee and ask them to adjust the prayer schedule.’ It’s a little bit – (the model is) not very sensitive (to local norms),” he laughed ruefully.
Asking to adjust the volume or alter prayer times may seem like a logical solution in European countries or the United States. However, in Muslim-majority nations in Southeast Asia, such as Malaysia and Indonesia, suggesting adjustments to prayer times at a mosque is not only impractical but may cause offence, potentially leading to significant backlash.
Muslims do not set their own prayer times. Instead, the times are determined by the position of the sun and vary according to geographical location.
Currently, most major generative AI chatbots like ChatGPT, Gemini and Llama were developed in the United States, and lack a deep understanding of the linguistic nuances, cultural practices and religious norms in Malaysia and its neighbours. This has triggered a rush on the part of Malaysia and other ASEAN countries to develop their own LLMs that incorporate local and regional cultural and religious sensitivities.
THE SENSITIVITY GAP
Malaysia aims to be a regional hub for AI technology and applications. Consequently, it has secured billions of dollars in investment in the past year from global tech firms seeking to build critical infrastructure to cater to the growing demand for their cloud and AI services. On Dec 12, the country launched the National Artificial Intelligence Office (NAIO), which will serve as the central authority to champion Malaysia’s AI agenda.
Other countries in ASEAN have taken similar steps.
An analysis – Advantage Southeast Asia: Emerging AI Leader – by consultancy Access Partnership found the region well-placed for the AI boom. In the 2023 AI Readiness Index, Singapore was ranked as the best prepared overall for AI among 12 countries in Asia Pacific and India, while Malaysia was the next best-prepared ASEAN country at eighth place behind India, followed by Thailand. Among the advantages include the 213 million population aged 15 to 34, 70 per cent of which are enthusiastic about AI, the higher adoption rate of AI among businesses and government support.
However, despite their edge, Southeast Asian nations, whose combined population reached 673.02 million in 2022, are not necessarily a top priority for gen-AI developers.
“The intensity really varied because there are so many countries and cultures in the world, and I don’t think Southeast Asia is, well, you know, rather quite high on the priority list,” said Tjhi, who is the head of Applied Research at AI Singapore, a national R&D program in AI.
This has led to an understanding gap, evident in responses such as suggesting a user speak to a mosque committee to change prayer times, or translating ‘love shack’ into the Malay language as gubuk cinta, a term that aligns more with Indonesian usage than Malaysian.
Bernama talked to experts who explained that producing a gen-AI LLM itself isn’t particularly difficult, as major tech companies like OpenAI, Google, and Meta share their Application Programming Interfaces (APIs), a set of rules and protocols that enable software applications to communicate with each other. The challenge, however, lies in the dearth of online content about ASEAN culture and issues in local languages.
Dr Saw Shier Nee, a senior lecturer at the University of Malaya (UM)’s Department of Artificial Intelligence, said all LLMs must be trained on the available online data.
“Currently, (the models are) able to read BM (Bahasa Melayu), but most of the BM data is based on Indonesian. There’s a lot of training data for Indonesian, which helps the models understand it. In Malaysia, however, we don’t have enough data to train the LLMs, which is why when you type in Malay, they often respond in Indonesian,” she said.
FILLING IN THE GAP
Another challenge contributing to the difficulty global LLMs face in relating to ASEAN is the region's diversity. ASEAN is home to around a dozen major languages and various political systems.
Some countries have monarchies, while others are republics. Several are democratic, while others are authoritarian. Many countries are a mix of leadership style and political systems.
Religions and ethnicities differ from country to country, and sometimes even within the same country. As a result, several gen-AI initiatives have emerged to reflect this diversity.
Malaysia has so far produced at least two gen-AI LLMs, MaLLaM by Malaysian start-up Mesolitica, and Ilmu 0.1 by YTL AI Labs, which was launched Dec 12. Both use BM, understanding and responding in the language, complete with the cultural nuances and context.
MaLLaM also understands local dialects and 16 other regional languages, and used in Amazon Web Services cloud services, according to a Bernama report on Dec 3.
Mesolitica collaborates with the SEA-LION (Southeast Asian Languages in One Network) team, which was built by AI Singapore. SEA-LION promises to be a family of open-source LLMs that better understand Southeast Asia's (SEA) diverse contexts, languages, and cultures.
Other SEA-LION languages include Indonesian, Vietnamese, Thai and Tamil.
To evaluate the proficiency of the LLMs in a language, developers test them against a set of benchmarks, which includes Multi-task Machine Learning Understanding (MMLU).
In Singapore and for the region, there is the Southeast Asian Holistic Evaluation of Language Model (SEA-HELM). For Malaysia, there is the Malay MMLU, built in partnership between UM and YTL.
Both the Malaysian LLMs scored high under the Malay MMLU, according to reports.
Prof Chan Chee Seng from the Department of AI at UM, who was involved in developing the Malay MMLU, told Bernama that he hopes the Malay MMLU will encourage other LLMs to fine-tune their models for Malaysia. He described the Malay MMLU as a step towards creating a robust AI ecosystem in the country.
To capture the full nuances and cultural context of the language, he explained that the team fed the programme educational materials from primary to the secondary school levels. The program also incorporated training in humanities, social sciences, and Science, Technology, Engineering and Mathematics (STEM) subjects, among others.
“For Malaysia, you need a model that understands the country. For example, it should be able to grasp Manglish, and (when) we talk about royal families, it should know what it means when you talk about the Sultan (or when) you talk about Parameswara,” he said.
“So regardless (if it's ChatGPT) or Gemini, all these models that are built by other people or other parties, they must be able to answer these types of questions about Malaysia to a certain extent,” he added.
COMPETITION
As analysts have given the Southeast Asian region a glowing recommendation to take advantage of the tech boom, there is cooperation between nations even as they jockey for a better position.
Prof Chan said that Malaysia is well-positioned to become the regional tech hub, as it has three key resources that tech companies seek: land, energy, and water.
However, he emphasised that these natural advantages would be moot without the proper infrastructure to support AI development locally. Tech giants, including Google, Nvidia, and Microsoft, have already invested billions of dollars in building data centres, tech facilities, and in training and upgrading local talent.
“Talent is important, but this is secondary. Infrastructure is still the main element. Without infrastructure, even with talent, it’s useless. Basically, the talent will just go overseas,” he said.
Much of the infrastructure is in the process of being built.
Edited by Salbiah Said