How hard is it to translate Lix into another language?

Started by lemming_1, January 19, 2015, 11:03:21 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

lemming_1

Lix is such an awesome game, I'd love to translate it to Hungarian  :D

Edit by Simon, April 2015: This is now possible. You need version 2015-04-25 or later. Look at the file doc/transl.txt for instructions.

ccexplore

I assume Simon will answer this question accurately at some point soon.  That said, even if the answer turns out to be "not easy right now", I'd like to propose getting the groundwork coding ready to make translations easy or at least possible.

Simon

Hi,

thanks for the offer. If you wish to try, translate the English strings in src/other/language.cpp, get the file as raw text version. It starts around line 420 and goes to line 930.

Problems:

  • The language is hardcoded in the game, and a little inflexible. You can't test it until you recompile.
  • Nothing but ASCII is supported. In particular, no diacritics.
  • Once in a while, it might be necessary to translate new strings for a new version. I will display English in the meantime.
  • Levels can have titles and hints in different languages. Our guideline is to have only an English title, so people with different language settings recognize the level title in a discussion. Tutorial level titles and their hint text should be translated, because that's what new players see first.

Best regards,
Simon

ccexplore

Hmm, interesting points on lack of diacritics support (hmm, is it an Allegro issue or something else?) and the need to translate tutorial-level texts. :(

Still, it got me thinking about what it would take to recode that file (and mostly only that file) so that it's possible to at least handle translations via an external text file loaded at runtime.  Maybe I'll take a stab at that myself some day...

lemming_1

Okay, thanks. Looks like I'll have to wait with translating Lix until it supports diacritics (or at least these characters: á é ö ü í ó ú ő ű)

ccexplore

Quote from: ccexplore on January 21, 2015, 01:51:47 PMHmm, interesting points on lack of diacritics support (hmm, is it an Allegro issue or something else?)

I looked through Simon's GitHub and Allegro's online manual.  At least from the manual, Allegro appears to support UTF-8 (in fact that's its default charset) and full Unicode range in its font handling, so in theory I believe it may be possible to extend Lix to support diacritics and other non-ASCII characters.  The work required will include:
  • extending Lix's font-loading routine to be not limited to just a single bitmap image (as is currently the case), which in Allegro would implicitly limit the font to the ASCII range
  • verify that the program's string handling and file I/O handling are compatible with UTF-8, at least on both Windows and Linux builds.  I believe that probably is already the case so no coding work (or at least very little) may be needed there, other than testing.
  • someone will still need to create new fonts for the new diacritic characters outside of ASCII.  I expect this work will likely be pushed out onto the external parties (ie. people like you) who actually plan to make use of these characters, although it's conceivable one of us may choose to do this ourselves in order to allow for non-ASCII German characters (German being currently one of the only two supported languages in Lix).
Anyway, it will take a while to get the coding work done for this plus other necessities outlined in earlier posts, even if I ended up getting involved.  I'm afraid you may have to wait for quite some time.

lemming_1

Don't worry. I can wait, I really appreciate all the hard work that's already gone into making Lix as good as it is :thumbsup: Just pm me if you need my help with anything!

ccexplore

Quick update on progress:

(the "translations" in this case is obviously programmatically generated and not a real language.  just a quick way to verify the font is working correctly)

Simon

Oh là là! Never thought to see something like this. Very nice!

The strings are still kept hardcoded in language.cpp, I suppose?

-- Simon

ccexplore

Quote from: Simon on February 02, 2015, 04:14:27 PMThe strings are still kept hardcoded in language.cpp, I suppose?

Nope, with my change, translations of the strings in language.cpp can now loaded from data/translate.txt, making it possible to create and use translations without recompiling Lix. (That was my primary goal after all.) More precisely:
  - Presence of data/translate.txt means there's a 3rd language besides English and German, and when present, this third language (whose display name comes from the file's contents) will be shown as an option in the initial language selection menu (the one you see the very first time you run the game) and in the Options menu later where you can change the user's language.  In the user's profile it will be stored as number 3 for $LANGUAGE.
  - The English and German text currently hard-coded in language.cpp will remain in there.  When user set language to the custom language, strings will first be loaded as if it were English (ie. from the hard-coded language.cpp stuff), then it will try loading from data/translate.txt.  This way, any new strings introduced by a new version of Lix that is not accounted for in an old data/translate.txt file, would at least result in English being displayed for them instead of nothing.
  - There's also a way to make the program dump the current set of strings for whatever the current language is (including English and German), outputting to a text file in the same format as data/translate.txt.  This feature is meant to help translators start off with a translate.txt file that has all the entries the program knows about.  If it's a new language, they can generate the file from English and then start from there.  If it's to update the translation file due to new strings introduced in a new version of the game, they can generate the file from the custom language (which would load from the outdated data/translate.txt and fall back to English for new strings), and then diff the result against the old translate.txt to see what's new.

For reference, I've attached the data/translate.txt I used for the screenshots.  I generated it starting from the dump feature under English, and then ran a script to programmatically convert the ASCII characters into the various counterparts with diacritical marks, to create a fake language of "Èñĝĺìśĥ". ;P

Simon

Yes, this is a good step forward. Translators can immediately check how their wording looks in the game.

Deciding on the exact file format or filename is something for later, do what feels natural for now.

If it helps, I'm completely fine with std::map <string, string> or similar containers to implement language.cpp's work. Right now, I use lots of variable names that are statically compiled, which makes the language.cpp code faster than dynamic string lookup. But looking up a handful of strings is not time-critical at all, therefore dynamic lookup is preferred.

-- Simon

ccexplore

A little off-topic from what this is originally about, but ever wanna curse say something in a foreign [European] language in multiplayer Lix?  Well...

[note:  to successfully input such characters you'll still need to install/enable a non-English keyboard layout in your OS, which I'm not honestly too familiar with, so I can't guarantee it'll actually successfully handle all the possible key combinations/sequences in real life.  Basically it just relies on the underlying Allegro library's Unicode version of readkey (ureadkey).]

Simon

I've tested ccx's code on Linux. It seems to work flawlessly, I can enter various üäß. Many thanks for your work!

We haven't tested mutliplayer yet because I'll be busy for a week, maybe at some evening these days.

We wish to enable unicode for usernames. Unicode usernames can be stored nicely in the game's global config, data/config.txt. However, the game saves further data to a file called <username>.txt. We want to save everything with ASCII filenames. I propose:

  • Use the unicode username in the global config, and use a mangled username as the user file's name. The following bullet points describe the mangling rules.
  • Save all chars [A-Z][a-z][0-9] as-is.
  • Save a space ( ) and a dash (-) as-is. This is because some users might have entered these already, and we don't want to break compatibility with the old file retrieval.
  • Save all other unicode chars as _<codepoint> with the codepoint as a 4-digit hex number in big endian. So, each such char takes up 5 ASCII chars in the filename. According to this proposal, even some ASCII symbols will be encoded like this. This allows users to have weird symbols in their name that won't be allowed in a Windows filename.
Alternatively, is there some standard for this, popular across all operating systems?

Is 4 digits of hex enough, or should we anticipate support for unicode above 0xFFFF? We could use __<codepoint> for those, with two leading underscores followed by 8 hex digits, big endian. Or we could, from the start, encode everything as _<UTF-8 code> in little endian.

We should find a solution to this before we'll do a release.

-- Simon

geoo

I'd propose to use _<UTF-8 code> in hex for everything except for characters in the range 0x20-0x7F. For those you could just use the normal ASCII character (which is equivalent to its UTF-8 counterpart). You can determine the number of characters that you have to read after a '_' by looking at the first byte (i.e. next two characters after it in the filename), see specification, so there's no ambiguity if someone uses [0-9A-F] in their username. The only caveat are characters like '_' (which is now reserved for your encoding), and forbidden characters like '/', which you'd probably also encode using _<UTF-8 code>. (How do you handle '/' in usernames right now anyway?)



Simon

Quote from: geoo on February 05, 2015, 08:49:58 AM
I'd propose to use _<UTF-8 code> in hex for everything except for characters in the range 0x20-0x7F.

0x20-0x7F includes some chars forbidden in Windows filenames.

Quote(which is equivalent to its UTF-8 counterpart). You can determine the number of characters that you have to read after a '_' by looking at the first byte (i.e. next two characters after it in the filename), see specification, so there's no ambiguity if someone uses [0-9A-F] in their username.

Right, that's why I'm considering UTF-8 mangling with little endian, i.e. _123456 means UTF-8 char 0x56 34 12. Converting this number to the unicode codepoint should be straightforward.

QuoteThe only caveat are characters like '_' (which is now reserved for your encoding), and forbidden characters like '/', which you'd probably also encode using _<UTF-8 code>. (How do you handle '/' in usernames right now anyway?)

Currently, the game tries to save to the filename including /, and behavior depends on C++'s std::ofstringstream when it cannot create the file. Maybe it throws an exception crashing Lix, or, more likely, it'll be left in non-good state and I don't check that anymore.

-- Simon