Internationalization (i18n) for Zettlr

Internationalization (i18n) for Zettlr

In this blog post, we explain why we switched away from Zettlr Translate, what the benefits of our new system -- gettext -- are, and how you can help Zettlr out by translating.

Internationalization (i18n) for Zettlr

Internationalization (i18n) for Zettlr

As we have announced in our last blog post, we have now retired Zettlr Translate. This included completely removing the database behind the service and thus, since there is no more danger of exposing any user data to the world, we can now also announce the real reason for why we did so.

If you are looking for a brief guide on how to translate Zettlr now, click here to directly jump to the corresponding section further below.

Until yesterday, translating for Zettlr worked like this: Users on Zettlr Translate would translate strings into various languages and would vote on which translations they felt were the most appropriate for any one string. Then, every week, an automated script would download a new version of these translations into the repository of Zettlr. A few weeks ago, however, we were beginning to retrieve reports that this script was failing to do so. Upon closer inspection, it turned out that there was a bug in the service itself that would throw a 404 error when the script was trying to download the Swedish translation.

Clearly, there was some translation that was causing an error in the service. However, we neither had the time nor the energy to investigate what was happening, and thus decided it was time to finally put this solution we’ve used for almost five years to rest. There is ample reason to believe that this may have not been the only problem in the service, since it was a custom-built solution.

Today, we have switched translations from the old system to gettext. More on what gettext is and why we believe it to be superior later. In this blog post, we want to shine a light onto the broader picture and answer questions, such as: why did we implement Zettlr Translate in the first place? Why did we replace it with gettext? And what are the requirements for translating software in general?

i18n for Software: Requirements

The interleaved processes of internationalization, or i18n for short (because there are 18 letters between the first and last letter) and localization (i10n), are defined by the English Wikipedia as such:

Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by translating text and adding locale-specific components.

Localization (which is potentially performed multiple times, for different locales) uses the infrastructure or flexibility provided by internationalization (which is ideally performed only once before localization, or as an integral part of ongoing development).

In other words: Internationalization describes the process of ensuring that an app can be adapted to local peculiarities around the world, whereas localization then describes the actual process of adapting software to those peculiarities. Among the things that people have to do during localization are translating any strings that are visible to users, but also to define currency, numbering formats (e.g., 1,000.00 vs. 1.000,00 vs. 1 000,00), date formats, and so on.

Both processes heavily depend on each other. The first step, i18n, has to be performed by the programmers of the software. It involves, for example, of thinking about a system that allows strings to be translated. By default, any string in an application is fixed and cannot be changed. The process of i18n ensures that this is not the case so that the strings can actually be translated.

But this is, as we can see in the definition, only the preparatory step to ensure that things such as translation can be performed in the first place. The second step then is l10n, and this is where the developers of software are often at a loss. Most people speak two languages, maybe three, and so it is impossible for any single person to translate anything into more than those languages. This localization step, therefore, requires the help of other people – those that speak the language in question.

This poses an implicit, not immediately visible requirement on the process of i18n: No matter how a developer prepares software for localization, one thing they have to make sure is that other people who are not familiar with the code of the program can still translate the application. Put differently: People whose task is to translate something should only worry about the translation, not about the rest of the software.

A second implicit requirement, however, pertains to the question of who should be translating those strings in the first place, and there are really just two options: either to pay professional translators for the work, or to ask the users of the software themselves to do it. In the first option, money is a vital resource, but professional translators can be asked to do more work themselves. The second option eliminates the requirement for contractual agreements, but poses the problem that users cannot reasonably be expected to handle extremely cumbersome systems for performing the job – they need an easy access to do so that does not require them to do much more than the actual translation process.

i18n in Practice

How does this now translate into practical applications? One de-facto standard for i18n is to define a function that has to somehow ensure that the component receives a string that corresponds to the language the user of the application has chosen. For Zettlr, this looks like this: trans('This string needs to be translated'). The function would then return, e.g., Dieser Satz muss übersetzt werden for the German language.

There is a competing set of standards for how to format the strings that need to be translated, however. One popular system is to use identifiers, rather than natural language, e.g.: trans('gui.dialog.confirm.ok_button'). This has the benefit that the identifier rarely has to change and that translations themselves can be stored efficiently. This is the system that Zettlr used until yesterday. The other standard, which I used in the paragraph above, simply uses English as the default and writes out exactly what needs to be displayed.

This has important implications for all the infrastructure that then has to be built around the translation efforts, however. If one uses identifiers, it may not be immediately clear what a translation should sound like. For example, if a translator sees the identifier gui.dialog.confirm.ok_button, should the text on the button then simply be “Ok”, or should it maybe say “Confirm” or “Yes”? Depending on the context and, more importantly, whatever text is right above that button, one translation is better than the other. Identifiers make software completely language-agnostic, but they also come with a host of problems and make the translation efforts more complicated than they need to be. For example, if you choose to reposition that “Ok” button in the application ever so slightly, this could change the meaning of the button’s text. The abstract identifier (ok_button) is still applicable, but maybe another word (“Yes” instead of “Confirm”) now is more appropriate.

Most software is being written in English anyways, so the application itself will already be biased towards English. This lends itself to simply putting a correct English sentence into the translation function. This has many benefits: First, the programmers – who know the context in which said button appears better than anyone else – can already propose a “gold-standard” translation. Second, there is always a reasonably international default in case a string is not yet translated (while it may not be beautiful to see a rogue English sentence amid an otherwise German interface, it will still be legible for most people). And third, context can transparently change the actual translations: If you exchange trans('Confirm') to trans('Yes'), this means that the identifier, so to speak, of the string has changed and this already indicates clearly to translators that another word needs to be used to describe the button’s text. Fourth, this makes the whole application more resilient against failure: If everything goes wrong and no translation could be found (or, worse yet, the translation service has failed), the function could still return the string itself without making the software unusable.

The next practical decision then involves the way to enable users to translate the software. As mentioned above, if you rely on your own users to translate the software – in the case of Open Source software such as Zettlr that comes with no price tag attached, this is really the only choice – you can only expect your users to do that much to help translate the software. You can expect them to do a few steps in order to translate the software, but every added step will make it less likely that your users actually help.

If you now have a completely custom system for storing the translatable strings, then the only way you have is to write a custom system that understands your custom format. This is what Zettlr Translate was. Back in 2018, I had a lot of time on my hands, so for me it seemed like a reasonable cost, and indeed, while I was employed part-time, I had lots of time to maintain Zettlr Translate. But ever since I started my PhD, that time has been diminishing fast. A common format that many others use therefore has the benefit that there are (a) more tools to choose from and (b) many more people involved in maintaining that common format. Additionally, using a common format opens up the help of online translation services such as Weblate or Crowdin which many programs already use.

The gettext approach

Now, let us finish off with some thoughts on what the approach that Zettlr now uses brings to the localization efforts. gettext is a suite of tools, developed in the 1990s to help make applications written in the C programming language translatable. It can automatically extract translatable strings from source code, stores those in a common POT (“Portable Object Template”) file from which in turn all languages can be derived in PO-files. The online documentation for gettext explains all the formats, the intentions behind it and how to perform translations.

The system is well-established by now, widely used, and has had to deal with many of the intricacies of translating software already so that it is resilient to the whole process. There is a whole suite of programs that can read PO-files and make it easy for users to translate software. By switching from our custom approach to gettext we could solve a ton of problems we were experiencing earlier. First, Zettlr now follows a common approach and joins many more programs, tapping into the large crowd knowledge for translating. Second, it is now much easier for us as developers of the software to enhance it, and establish many more features more quickly than before.

How Translating Zettlr Works

Let us close with a quick introduction to how translating Zettlr works from now on. In the coming weeks, we will also update the user documentation to reflect this. If you now want to translate and you already know how to work with PO-files and know git a little bit, we don’t have to explain anything to you, but if you don’t, here’s a quick introduction.

All translations for Zettlr are now stored in the lang directory of the repository. An automated script will every night (or day, depending on where on Earth you are) extract all translations from the source code and update the translation template file. Then, it will merge the new translations with the existing PO-files in the lang directory. You can, if you want to, click any of the PO-files to view what such a file looks like.

To begin modifying a translation file, you need to first download the file. On GitHub, you can do so by first clicking the file, and then clicking the “Raw” button. This will open the file without the user interface of GitHub around it. You can then save that file to your computer (e.g., by simply pressing Cmd/Ctrl+S). Make sure that your browser does not save it with the *.txt-extension, but with the correct *.po-extension.

After clicking on the corresponding PO-file you would like to edit - here: the German PO-file -, click the Raw-button to download the file.

If a corresponding file for the language you would like to translate into is missing, please get in touch with us. We can create the file for you so that you can then follow the rest of this guide.

Next, you will need a tool to translate the file. One good solution is POEdit. The program is open source and works on all platforms. It contains a paid “Pro” feature which you can ignore, as it is not required to translate files. All it does is enable you to have some online service auto-translate some messages for you.

POEdit also allows you to set your name and email-address in the settings (if you want to) to indicate who last modified the translations. With this program, you can now open the PO-file. This is how it looks like if you opened the German translation (filename: de-DE.po) with POEdit:

How POEdit looks like with the German translation file loaded.

Then, you can begin modifying the translations.

One you are happy with the work you’ve done, you can add your changes to the Zettlr repository. For this step, a GitHub account is required. Creating an account is free of charge and we all use GitHub accounts to work on Zettlr. Next, open this great guide by GitHub on how to propose changes to a file. Follow all the steps, choosing the PO-file you just downloaded and modified as your target.

In step 3 (“Make any changes you need to the file.”), you need to copy and paste the complete contents of the PO-file you just edited. To do so, simply open the PO-file with a text-editor (TextEdit on macOS, Notepad on Windows, or whichever text editor ships with your distribution on Linux), select all the text and copy it. Then, in the editing window on GitHub, select all text there, remove it, and paste the contents of your file. Then, follow the rest of the steps in the guide. You do not have to get fancy with thinking about a great title and description of the changes you have made and can just leave it at “Improve German translation” or similar.

After you have opened the Pull Request, we will have a look over what you did and then merge your modifications into the repository. You do not have to worry about making any mistakes. We will spot them and we can work together to fix them!

If you have any questions during any of the steps we just described, do not hesitate to ask for help. The easiest way is to ask on our Discord server (see the link at the top of the page) but if you prefer not to use this service, you can also send us an email.

Always remember: By translating, you are basically donating your free time to our project, and we will of course help you with any problems you might encounter. We don’t want you to run into any demotivating problems while helping us out!

Conclusion

Switching from the established Zettlr Translate system to gettext will need some time to get used to, but we believe it is the best choice available for ensuring that all can easily translate the app into any language! If you have any additional questions, please get in touch.