Amplifying Portugal’s Voice in the Age of AI: Prospects of Building PT-PT Large Language Models

Large Language Models (LLMs) are transforming how we learn, work, and communicate. Yet despite Portuguese being one of the most spoken languages worldwide, European Portuguese (PT-PT) often struggles to find its place in today’s AI systems. According to ECAIRE’s new perspective paper, PT4PT: Preserving Portugal’s Linguistic Identity in Lusophone LLMs, most existing models overwhelmingly reflect Brazilian Portuguese (PT-BR), leaving Portugal’s ten million speakers underserved.

Unlike the explicit marginalization faced by other low-resource languages, Portuguese faces a different challenge that is dialectic “fuzziness.” While Portuguese boasts over 250 million speakers globally, roughly 70% live in Brazil, making PT-BR dominant in online texts and training datasets. As a result, AI models trained on “Portuguese” often fail to capture the syntactic, lexical, and cultural nuances of PT-PT. This can undermine clarity and trust in sensitive contexts such as education, healthcare, and public administration.

ECAIRE recognizes several national efforts by Portugal. Most notably, Portugal has launched AMALIA, its first open-source LLM designed specifically for European Portuguese. Backed by €5.5 million in public investment, AMALIA aims to deliver accurate PT-PT outputs for key sectors, from digital government services to classrooms and clinics. The project reflects a growing national commitment to linguistic sovereignty in the age of AI.

The research also highlights the vibrant role of the open-source community. Independent initiatives like GlórIA and PTIcola have assembled large PT-PT corpora, created new benchmarks, and built models capable of generating idiomatic and culturally coherent text. Together, these efforts demonstrate that Portugal’s academic and developer ecosystems are stepping up to secure PT-PT’s digital future.

Challenges remain, particularly in scaling up datasets, refining evaluation methods, and embedding ethical safeguards tailored to dialect-specific contexts. But the opportunities are significant. With stronger collaboration between academia and industry, deeper integration into government services, and alignment with EU-wide AI initiatives, Portugal is well-positioned to lead in building linguistically inclusive and culturally grounded AI.

At its heart, the paper makes a simple but powerful point: AI is not just about technology; it is about language and culture. When LLMs treat Portuguese as a single, uniform variety, subtle but meaningful differences between Brazilian and European Portuguese risk being blurred. Clarifying these distinctions does not divide the language but enriches it. Prof. José Barata of UNINOVA added onto the internal diversity of the Portuguese language. “The main difference we know is between Brazil and Portugal,” said Prof. Barata, “But there are also differences between the Portuguese spoken in other Portuguese speaking countries such as Cape Vert, Angola, Mozambique, East Timor, Guinea, and S. Tome. These differences can also be coped by a specific LLM for Portuguese.” By accounting for and representing the cultural nuances in LLMs with accuracy, our AI systems can be more inclusive, culturally attuned, and responsive to the full diversity of Lusophone communities.

For readers interested in exploring the research in depth, the full paper is available for reading and download in English and Portuguese.

Download (EN)
Download (PT)
Previous
Previous

Safeguarding Young People’s Socioemotional Well-being: New White Paper Cautions AI Companions as Therapists

Next
Next

Honoring a New Chapter: ECAIRE, VV-AI, and ReadyAI Sign MOU with Oliveira do Hospital