Weekly Penguin
September 1st, 2022
Making of TAGAP 4

Translation support

One of the most requested features over the years has been a better support for translations for TAGAP titles. You could create a translated version of the game, but that meant writing and saving over the game files or creating a Mod. That is history now.

TAGAP 4 supports multiple languages.

Let's run down what this actually means. This is a slightly more 'dry' TAGAP Thursdays update lacking media-sexy screenshots or new music, but as this feature has been requested, I thought this is important topic to cover.


Both the engine and the content of TAGAP 4 are built in English and the core game files themselves are 'the default English language pack'.

However, you can now also have as many language packs as you wish installed separately and select the one you want straight from the main menu.

How many translations will there be?

The game will launch in English only.

Since I'm only bi-lingual (Finnish and English) and can't afford to hire localisation teams to create translations, we can't really produce the translations here at Penguin DT HQ. This something that would need to be a 'community effort', but one I'll be supporting all the way.

And by that I mean providing help in getting the translations working and if the creators so wish / allow, making the language packs available on the TAGAP website.

How do the translations work?

For the end user: Copy the translation files to the 'tagap4/translations' folder and it becomes available in the game.

For the translators: Each translation consists of two things; an indexing file and a folder with all the script files needing translations. For example, if a cut-scene has dialogue, the translation folder fill have a corresponding file with the translated dialogue. Simple as that.

I will create a template with all the translatable text to make creating translations as easy as I can.

When are the translations available?

The aim is to have the full translation template ready for launch on November 14th. Then anyone will be able to do a translation if they so wish. There's no point of doing this earlier, as the template is just a collection of text lines – there are moments when you would need the context from the game to translate them properly.

What languages are supported?

TAGAP 4 fully supports script files encoded in ASCII and UTF-8. The supported Unicode blocks (that is, sets of characters) are:

  • Basic Latin (ASCII)
  • Latin-1 Supplement
  • Latin Extended-A
  • Latin Extended-B
  • Full standard Cyrillic

Latin-1 Supplement is the Extended ASCII range which is now fully supported. Likewise, for Cyrillic there's now full support as opposed to just the Cyrillic ASCII range like before.

The reason the range is limited to these character sets at the moment isn't technical, but matter of time management. TAGAP 4 uses two custom fonts, ones that I have to build glyph by glyph, so I had to draw the line somewhere. However, font is expandable, so if a need arises, I can potentially add more character sets in the future via updates.

September 22nd, 2022

The list of supported unicode blocks has been since expanded. For the updated list as of September 22nd 2022, check out the relevant progress update blog post.

This has been requested for ages, why did it take this long?!

To be frank, I wasn't sure how to pull something like this off without having to restructure the entire engine.

Every time I started to mull over the possibility of doing this, it was during the final stretch, when I was figuring out ease-of-life features to polish things. At that point it is basically too late to do sweeping foundation level engine restructuring and I couldn't figure any other way to do it.

Fast forward to earlier this month. I was back at the same point – release is around the corner and again I'm thinking; 'multiple languages would be great, it's been requested for so long'. And this time I felt more stubborn decided that damnit, I'm going to do it! I fired up the coffee maker and started dabbling.

Behind the process

After three days conceptualising with pseudocode, I came up with a system that would require only one larger foundational change to the engine, but would otherwise work as an additional layer of script loading. Actually, after the 'aha' moment, the external translations part was fully ready in about two days.

The UTF-8 Unicode support was a completely different case, however. To put it simply – the absolute maximum for font character size in the past was 255 and I couldn't get past that without taking a sledgehammer to the oldest parts of the engine and completely re-doing how the text output is handled. So, I created my own data structure for this purpose and converted all the text output to the new format.

I could write an entire separate post full of programming musings how bonkers creating your own font glyph index system was, but I'll save you from my techno-babble. However, I have to mention that the creation of the glyph index was the very first time in the history of TAGAP that I had to fire up LibreOffice Calc and create a set of spreadsheets to keep track of all of it. On a positive side, once the spreadsheets were made, I could just copy-paste all the tables to the code.

I do have to mention that the new system is more effective, particularly during 'runtime'. As the text strings are built of direct references to the correct glyphs – as opposed to being converted from a C character array as before – there are less calculations needed on the fly. Handy!

But that's enough tech talk.

Next week we'll be back with something less dry and more funky!

Until next time,

Jouni Lahtinen, the head penguin