Feature

Why Countries Are Building Their Own ChatGPTs

From South Africa to Singapore, labs are building AI to preserve their culture.

Written by Jennifer Guay | 7 min • October 01, 2025

Why Countries Are Building Their Own ChatGPTs

When Ankush Sabharwal talks about the global AI economy, he builds on the decades-old adage that “data is the new oil,” but with a crucial distinction. “Data is not the oil,” the founder of Indian AI company BharatGPT said. “It’s crude oil. Data is nothing unless you refine it.”

Countries such as India produce vast amounts of data, but most of it gets processed elsewhere — into AI systems that those same countries then purchase as finished products. The parallel is unmistakable. It’s a dynamic that echoes older patterns of resource extraction, prompting nations from Singapore to South Africa to ask: If we have the raw materials, why not build the refinery at home?

This realization is driving a significant shift in how AI gets built. Rather than simply feeding more diverse data into Western models, countries and organizations are building their own AI models from the ground up, designed to understand not just different languages, but diverse ways of thinking about the world. These homegrown alternatives aim to preserve what Western algorithms can overlook: the regional dialects, local knowledge, customs, traditions and values that distinguish one culture from another.

The sovereign AI movement reflects something deeper than technological nationalism. It’s about who controls the tools that increasingly shape how people access information, healthcare and economic opportunity — and whether the current system inadvertently recreates old hierarchies.

What Happens When AI Doesn’t Speak Your Language?

The stakes become clear in everyday interactions. When Leslie Teo, Senior Director of AI Products at AI Singapore, asked ChatGPT about the best way to eat durian — Southeast Asia’s notoriously pungent “king of fruits” — the AI confidently prescribed the Singaporean approach. Wait a few days to eat the fruit after it falls from the tree, the LLM advised Teo, to allow the flavor to develop fully. But Thais tend to prefer their durian fresh-cut from the tree, with no smell at all — a cultural distinction the AI missed.

For Teo, this wasn’t about fruit preferences. It highlighted a fundamental problem: the world’s most powerful AI systems are overwhelmingly trained on English-language data and Western perspectives.

The global AI market, valued at over $244.2 billion in 2025, remains dominated by companies from the United States and China. OpenAI alone raised $6.6 billion in October 2024 — the largest venture capital deal in history — while Chinese tech giants such as ByteDance plan to invest more than $12 billion in AI development in 2025. Meanwhile, African AI-focused startups received just $641 million in funding from 2022 to 2023. But while China is edging ever closer to the US in the global AI race, ChatGPT remains the most widely used consumer chatbot globally, with 940 million downloads compared to Deepseek’s 127 million, according to a June 2025 Sensor Tower report.

Is Your Data AI-Ready?

Gain exclusive insights, strategies, and trends from a global survey of business leaders on staying ahead in the AI era.

Get the Report

The durian example reveals a deeper issue: Western AI systems can flatten diverse perspectives into a single “correct” approach. Recent research bears this out. AI suggestions often “homogenize writing toward Western styles and diminish cultural nuances,” several studies have concluded. In one, researchers prompted both ChatGPT-4 and an Arabic-specific model to suggest food dishes, drinks and women’s names. Even when prompted in Arabic, the models produced Western-centric responses: ravioli for food; whiskey for drinks; Rosanne for names.

In August, researchers found that ChatGPT is already changing how English speakers communicate, with words such as “delve,” “meticulous” and “intricate” showing up more often in spoken conversation. In other words: humans are already starting to mimic AI.

From Mistranslation to Cultural Misrepresentation

In South Africa, the linguistic gaps become even more pronounced. Researchers at Lelapa AI, a South African company building AI for African languages, discovered this firsthand when they tested ChatGPT on isiZulu, the country’s most common first language. A simple request to translate the phrase “it is expensive to catch a flight” produced “it is better to manage the transportation costs.” When asked to count in isiZulu, the model invented nonsensical combinations like “ku-one, ku-two,” essentially slapping English numbers onto made-up isiZulu syllables.

The technical barriers run deeper than simple vocabulary gaps. African languages have structural features that confound AI systems designed for English. Many build complex words by combining roots with multiple prefixes and suffixes. A single isiZulu “word,” for example, can encode what requires an entire phrase in English. When AI encounters these compound structures, it often fractures them incorrectly, creating meaningless fragments. Tone and diacritical marks present another challenge: a subtle accent or tonal shift can completely alter the meaning.

“The root of the problem is that many popular AI language models have limited capabilities for low-resourced languages like those in Africa,” said Jade Abbott, CTO and co-founder of Lelapa AI. “They’re often trained on whatever text is available on the internet, which for African languages tends to be dangerous, offensive and, frankly, garbage data in many cases. Because of this, the model might regurgitate mistranslations or even offensive stereotypes.”

These are more than technical glitches — they’re fundamental misalignments. If current trends continue, AI tools risk not just excluding non-Western perspectives, but actively reshaping them to fit Western paradigms.

The Economics of Digital Dependency

Sabharwal, who is also the CEO of AI company CoRover, has spent nine years building AI that navigates India’s linguistic complexity. While India officially recognizes 22 official languages, thousands of additional languages and dialects receive no support from Western AI systems.

Sabharwal’s models understand the creative ways in which Indians actually communicate online — the slang, colloquialisms, code-switching between languages and inventive spellings that can emerge when people thinking in Hindi or Tamil type in English. Western AI systems struggle with this multilingual flexibility because they are designed to operate in one language at a time.

India produces enormous amounts of data through its 954 million internet users. Much of that data flows to foreign companies for processing in distant data centers, allowing what some see as a colonial-style trade relationship to persist. Sabharwal’s solution isn’t to reject foreign technology entirely: he emphasizes that countries should collaborate and “pay the price for the best.”

But he argues that India should build local capabilities to process data — so Indian information can benefit Indians, instead of simply enriching foreign AI companies.

Community-Centered Development

Regional AI initiatives are pursuing a fundamentally different development model: rather than scraping whatever data is available online, they’re working directly with local communities to gather, verify and annotate training data.

As part of the Singaporean government’s $54.5 million multilingual development program, SEA-LION researchers assembled nearly a trillion tokens (individual words and word fragments) of Southeast Asian text — a process that required rebuilding data collection systems from scratch. Standard filtering algorithms, designed for Western languages, routinely classified Thai, Khmer and Lao text as spam.

In Kenya, another community-centered approach has proven life-saving. Jay Patel, Technology Director at maternal health nonprofit Jacaranda Health, realized that mothers with basic phones and unreliable internet access had nowhere to turn for reliable health information. So his entirely Kenyan team of 13 built UlizaMama, a maternal care-focused AI that fields 12,000 health questions daily.

The results demonstrate the power of purpose-built AI: nearly 90% of mothers advised to seek medical care actually follow through, and Jacaranda Health has documented a 27% increase in prenatal checkup attendance.

Is Your Data AI-Ready?

Gain exclusive insights, strategies, and trends from a global survey of business leaders on staying ahead in the AI era.

Get the Report

Its success lies in understanding how people actually communicate. “Moms are not asking questions in university Swahili,” Patel explained. The system processes informal language mixed with English, local slang and abbreviations, taking into account the casual way in which people communicate via text.

UlizaMama employs a two-step safety check: one algorithm generates responses while another audits them for medical accuracy, grammar and appropriate tone. Only responses scoring 90% or higher reach users directly.

Redefining Data Ownership

Lelapa AI takes the community-driven approach even further, developing what it calls the Esethu Framework, a data licensing system that requires companies using community-collected African language data to fund additional local data collection. It’s designed to ensure that African languages aren’t just “a free resource to be mined, but a valued asset that remains under African custodianship,” said Abbott.

Lelapa AI researchers conduct extensive offline fieldwork, working directly with linguists and community members to gather, verify and annotate training data. This approach helps identify use cases that actually matter to local populations, such as better transcription for local radio content and translation tools for rural health clinics. Pelonomi Moiloa, Lelapa AI’s CEO, describes this as building AI “with the community, instead of for the community.”

Their latest LLM, InkubaLM, is designed to run on local servers or even offline without requiring the massive cloud infrastructure that powers systems such as GPT-4. This architectural choice reflects a broader philosophy: while Western AI offerings are increasingly powerful, they remain black boxes controlled by distant corporations. Lelapa AI’s models can be deployed on local servers, integrated by local developers and modified by local technologists.

The Sovereignty Dilemma

As these initiatives mature, they face complex questions about technological independence. Teo acknowledges an uncomfortable reality: SEA-LION builds on foundation models created by American companies. “I’m very careful with [the term] sovereignty because SEA-LION is built on top of [Google’s] Gemma and [Meta’s] Llama,” he said. “It would be ironic to say we want to be sovereign when 90% of what we do is building on top of a corpus that’s openly shared.”

This reflects a broader tension facing regions trying to reduce their dependence on Western AI. Teo and Patel are pursuing a pragmatic middle ground: maintaining control over the data and capabilities that matter most, while leveraging innovations from Silicon Valley giants.

Moiloa, meanwhile, frames AI sovereignty in terms of who owns and benefits from the data that feeds the systems. “Western models trained on African data are usually proprietary. They don’t always give back to the communities that generated the data, and Africans have little control over how that data — or the model — is used,” she explained.

Building sovereign AI is not just about plugging more African data into a foreign model, Moiloa said. It’s about building AI expertise and decision-making power at home. “This means we aren’t at the mercy of Big Tech’s priorities. Our languages won’t survive and thrive digitally if we depend on someone in Silicon Valley who might drop support at any moment, or do a half-hearted job,” she said.

Preserving Culture in the AI Age

The broader risk is that current AI development patterns could accelerate a kind of digital Darwinism, where only the most digitally represented traditions survive in AI-mediated spaces. As Moiloa highlighted, languages are disappearing at an unprecedented rate. More than 1,500 are at risk of extinction by the end of the century. AI systems that ignore them may be accelerating their demise.

Regional AI offers a different path — not just in terms of language preservation, but cultural representation. Ask a Western AI system what to do with a million-dollar windfall, Teo pointed out, and it will suggest investment strategies and tax considerations — standard American financial advice. But a Thai person might instead dream of pursuing enlightenment by living in a monastery, or donating prodigiously to charity.

These aren’t just different answers. They reflect distinct value systems and ways of understanding prosperity. AI that recognizes these differences validates worldviews that might otherwise be lost in a Western-dominated digital landscape.

The real test for these projects will come as the dominant AI companies inevitably improve their multilingual capabilities and cultural understanding. Can regional AI initiatives maintain their advantage in cultural nuance, even as the technical gap narrows?

Moiloa isn’t worried. In 5–10 years, she envisions, using AI in isiZulu or Swahili will be as unremarkable as using it in English. The movement she represents is about ensuring that African languages and cultures have a permanent place in advanced technology — built by locals, for local realities, rather than as an afterthought in Silicon Valley’s global expansion.

AI
Data Management

Jennifer Guay

Contributor

Jennifer Guay has been a content strategist and editor for Google, Microsoft, and Flo. She previously worked as a journalist, with bylines in The Times, The Guardian, and Foreign Policy, among others. She regularly wrote for the Financial Times Studio as their Technology & Innovation Specialist and has authored reports on tech topics for the City of London, Canadian Digital Service, and the UN. She has also worked on branded podcasts for Adobe, Red Hat, and Headspace. Jennifer has an MPA in Digital Technologies and Policy from University College London.

The Array is a thought leadership publication curated by Hitachi Vantara, dedicated to exploring the intersection of data, technology, innovation, and leadership. The views, thoughts, and opinions expressed on this website are solely those of the individual authors and do not necessarily reflect the official policy or position of Hitachi Vantara or its affiliates.

Which topics do you want to hear about?

You are subscribed to The Array!

Why Countries Are Building Their Own ChatGPTs

What Happens When AI Doesn’t Speak Your Language?

Is Your Data AI-Ready?

From Mistranslation to Cultural Misrepresentation

The Economics of Digital Dependency

Community-Centered Development

Is Your Data AI-Ready?

Redefining Data Ownership

The Sovereignty Dilemma

Preserving Culture in the AI Age

Jennifer Guay