Internationalization: A Journey Towards Linguistic Inclusion

By Sebastian Barrenechea on Dec 27, 2023
Image generated by Firefly Image 2 with the text: award winning art, a branched tree made of light rays and energy, dramatic, impactful, colorful, high definition, 4k uhd

From the inception of this page, I chose to use English as the main language, assuming it was the most prevalent on the internet. However, I recently heard that among Meta and X networks, the second most used language is Spanish. Personally, I had a pending debt to showcase my content in my mother tongue, and today that is a reality.

In recent days, I’ve internationalized every corner of the site, assigning key/value format values for the interface, and translating the content of projects and posts in an automated way using LLMs (Large Language Models).

Where do I start, my friend?

There are two main aspects to consider: the user interface and the content.

Astro’s documentation and its source code were quite helpful with the base logic, however, I made several adjustments to adapt it to my needs and keep the code as clean as possible.

This site mainly focuses on the content of the posts and, in terms of interface, it’s quite simple. The largest amount of static text was on the homepage, so I started extracting each value into key/value objects, allowing the view to render the same way.

export const en = {
  'author.name': 'Sebastian Barrenechea',

  'nav.fork': 'Fork me on GitHub',
  'nav.sr.open': 'Open navigation menu',
  'nav.home': 'Home',
  'nav.projects': 'Projects',
  'nav.posts': 'Posts',
  'nav.page': 'Page',
  'nav.language.select': 'Select language',

  'hero.greeting': "Heey! I'm",
  // ...
};

Route handling is crucial for identifying the pages to build. Astro facilitates this with its routing system, allowing the language to be available as a parameter:

  • Move all pages within the src/pages directory to a src/pages/[lang] directory.
  • Add something like this on each of the pages that require i18n handling:
    const { lang } = Astro.params;
    

With the English version working, I started adding Spanish support. Here I encountered a translation dilemma:

Screenshot of the homepage with Spanish texts 'Últimos Publicaciones' and 'Ver todos los publicaciones', incorrectly translated

In English, I could use the same values for both projects and posts, but not in Spanish.

In an ideal world, I would use gender-neutral language. I could have opted for “Proyectos recientes” and “Publicaciones recientes”, which would allow me to recycle the word. However, to maintain the visual consistency of the page, I needed the phrase to end in “proyectos” or “publicaciones”. After some adjustments, I managed to get it fully translated in all views (the index, navigation pages, and templates that use the content).

And there was a problem with the language selection component I had used that bothered me a lot. The component required the absolute class to position objects below it, which caused it to stop respecting the page margin in case of very long text:

Screenshot of the previous navigation bar, with the language selector in the top right corner exceeding the margin limit

Margin outlined in red to highlight the problem

I extracted that component from Starlight because it seemed simple and minimalist, but in the end, I replaced it with one from Flowbite reimplementing interactivity using the Web Components API. The new component requires an additional graphic for each language (the flag, using @iconify-icons/circle-flags), but it’s worth it.

Screenshot of the current navigation bar, with the language selector in the top right corner

But there’s no content to be seen, huh?

When adjusting the site content (projects and posts) to their corresponding English route, the site could not find the content in Spanish, resulting in a page with no navigable content. The first tests I did were manually translating with the OpenAI API, and then I automated the process to make things easier.

You can see more details in an upcoming post about using LLMs in production.

After finishing the translation of projects and posts into Spanish, I made some adjustments to the resulting translations. I set Spanish as the standard reference for the automated translation process and, after the necessary adjustments, regenerated the content in English.

What now?

Now it’s surprisingly easy to add support for more languages! There are certain considerations to keep in mind, such as “assuming” that the language is read from left to right, which excludes languages like Arabic or Hebrew. Tailwind, the CSS framework I use on this site, natively supports LTR (left to right) and RTL (right to left), but some details need to be adjusted (for example, using the rtl:space-x-reverse class where necessary).

With Germany in third place for visitors to my site, I have implemented the translation into German without difficulties and will soon include Italian, French, Simplified Chinese, and Icelandic (I♥️Iceland). Although automation makes the process easier, it is essential to review the translations to ensure their quality, as they are not always perfect. In the case of English, I was able to do a personal review; however, for languages like German, which I do not speak, I have resorted to the help of tools like Google Chrome’s automatic translation to verify consistency. My goal is to eventually collaborate with native speakers for each language to raise the standard of the translations offered.

Content translated by gpt-4-1106-preview

©2022-2024 Sebastian Barrenechea. All rights reserved.

Built with Astro v4.15.9.