A colleague and I have started using POEditor as a way to organize translated strings for our apps and collaborate on getting them translated. POEditor’s web site, with its free account up to 1000 strings, looks like a useful cloud-based tool. It even bills itself as “The localization management platform that’s not a mess!”
However, the truth of that promise starts to come into doubt as you read documentation based on ambiguous terminology that the web site doesn’t define. The web-based tools probably work OK if you’re typing in terms and translations in the browser, where you don’t have to worry much about unique identifiers, or what various fields are called. But when it comes to importing and exporting those translations, you have to map the fields correctly, and that’s hard to do when they’re not defined clearly. What exactly are term, context, and reference, and how do they differ from each other? What about translation and definition? It took me, with help from the colleague who told me about POEdit, a good while to struggle through the confusion and nail down what (I believe) the POEdit terminology means. So I’m going to write my conclusions down here and maybe a bit about why I think those are the right answers.
One of the big keys to understanding POEditor is the unstated dependence on the PO format. What’s the PO format? A text file format used by gettext (and other software localization systems) to store a catalog of translated messages. If you come to POEditor from a background of knowing PO and gettext, you’ll have less trouble understanding how POEditor structures its data. If you don’t, you may get pretty confused trying to sort things out.
Primary sources about the PO format:
- The Format of PO Files, from the gettext documentation, is probably the definitive source, but it does have a fair bit of nitty-gritty detail specific to gettext.
- The PO Format, from the Pology documentation, provides some more insight into how the data is used. It’s probably a quicker introduction to the format.
term: an ID that can also be an “untranslated” message
The most important POEditor word to understand is term. (Yes, term is a term, but I’m trying not to make the confusion worse by using that term in two different ways! D-:)
The term corresponds to the msgid of PO format: an “untranslated” string that serves as the unique identifier of the “message” to be translated. Notice the double function of this string: It serves as both the ID and as the “untranslated” version of the string. So your msgid might be any of the following:
(Found here – click View below “Sample file”)
Unknown system error
Thank you for signing up !\nPlease check your email address to activate your account.
(Found here – click View below “Sample file”)
I think this was the biggest source of confusion for us: To a programmer, the idea of using a free-form string of unlimited length as a unique identifier is so counterintuitive (for good reason) as to keep sending us off in other directions trying to find a more reasonable interpretation of the sample data. It also was confusing that POEditor’s JSON sample data used terms that look like conventional identifiers, while the definition field holds user-readable English messages; whereas the CSV sample data put the user-readable English messages directly in the terms field.
Let’s suggest a best practice, for any project of reasonable complexity, of not using a natural-language string as the term, but instead using an alphanumeric (no spaces) identifier. If English is the source (“untranslated”) language, then the English string can be put in the translation or definition field of its own translation file, and English can be made the default reference language. This allows a separation of concerns, so that the identities of various messages in the application don’t get tangled up with their expressions in one particular language, with its potential ambiguities and homographs (see following section on context).
One final point of confusion is that in some parts of POEditor, the term field is described as being required; but in practice, it is apparently possible to leave it blank (provided that a context is supplied, and is unique among empty terms). This is not recommended.
context: making the ID unique
While identifiers are expected to be unique within some scope, an application of any size can easily begin to have multiple IDs that look exactly the same. This can be all the more true if the source language (“untranslated”) text is used as the ID: In English, the word “File” might be used as a verb in one part of the application and a noun in another part, whereas in other languages the two messages required might be completely different. Separate translations would require distinct terms (as identifiers), which is not possible if the terms (as “untranslated” text) are identical. The solution: using contexts to distinguish the identical terms.
A context (msgctxt in PO format) is like a namespace that is combined with a term (msgid) to form a unique identifier. For example, there could be a term “File” with a context of “verb”, and another term “File” with a context of “noun”. Or the context could refer to the UI components where the messages appear.
Note that context is optional: If all terms are already unique, no context is needed. It’s apparently fine for a few terms in a project to have a context while others don’t.
In POEdit, the “primary key” of a record is the combination of its context (if any) and its term. This affects what happens during importing, synchronization and updating.
reference: where a term is used
Having read in POEdit help about the Default Reference Language, it might be tempting to assume that the reference field for a term holds a translation in the reference language. But that’s not it. Instead, reference lets you record the place(s) in the application’s source code where the message is used. This is the equivalent of the “#:” automatic comment in PO format. Some sample references are:
I don’t know whether the exact format matters. There are probably tools that can automatically follow, and/or generate, those references. Personally, I don’t plan to spend a lot of time maintaining them. I don’t expect them to be needed often, and searching on demand is a lot easier. 🙂 The code references are not that helpful for translators, who typically won’t have access to or know where to find the code. To help a translator gain insight into how and where a message is used, the comment field (“# ” translator comment in PO format) seems more useful, possibly with a link to a screenshot or mockup.
translation and definition
In PO format, msgstr gives the translated string in a target language. (“All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language.”) This seems to apply to imported/exported files in POEdit as well: you can only view or import translations in one language at a time. (What about exporting?)
In the JSON import format, the translated string is expected in the definition property. It can be either a single string, or a set of key-value pairs (JS Object) where varying forms can be selected based on cardinality. See examples here.
As far as I can tell, translation and definition are the same, aside from the fact that a definition can contain forms for various cardinalities.
These fields could also be discussed, if necessary: