Jump to content
The Dark Mod Forums

Geep

Contributor
  • Posts

    1185
  • Joined

  • Last visited

  • Days Won

    61

Geep last won the day on July 9

Geep had the most liked content!

Reputation

998 Legendary

5 Followers

Profile Information

  • Location
    Mid-Atlantic, US

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Minor typo in catalan.lang and [Catalan] section of all.lang, see https://bugs.thedarkmod.com/view.php?id=6634 Also, I've revised my gen_lang_plus program to include catalan (instead of failing when it doesn't recognize it). Will release v1.1 Real Soon Now.
  2. Intro to Air Pocket’s Readables Air Pocket has just two readables, with shared and duplicated pages; there are really only two unique pages. One is titled “Appointment to Service”, and represents printed, rather draconian boilerplate given by the ship captain to every hired seaman. The other, titled “Further instructions”, is handwritten and specific to our hero and his lady. This appears as a readable by itself, but is also collated (as page 2) in a bundle and nailed up with two copies (pages 1 and 3) of the “Appointment to Service”, for Thief and Emily as new hires. These immobile readables appear at the outset of the FM, which makes iterative localization testing easier. They merely set the scene and have no game hints, so no worries about spoilers here. Because these are not mobile readables, there is no need for inventory names with associated #str_ . (If they were mobile, the translations of the titles could serve as inventory names.) In this post, I’ll discuss the AI translation, saving the largely (but not entirely) post-translation text-expansion issues for the next post. Prompt Engineering for Readables When feeding to AI, the two pages can reasonably share a prompt, but otherwise be treated independently; there is no compelling reason to prompt them as story flow. I am skipping asking ChatGPT to do the back-translation, because I don’t trust it to not cheat. Just put a tag at the end of each line, “//uv”, for unverified. Perhaps I’ll ask another AI to back-translate later. A draft reusable template follows. It describes the overall translation task, the desired tone, and input and output formats. Text specific to readables is shown within the next section in bold. Text that is specific to this FM, to clarify the context and the meaning of particular words, is in italics. Input to ChatGPT You are an expert translator between English and other European languages, including Russian. You wish to translate a list of pages from English to these other languages. In the list, each page is shown by two consecutive lines: the title text of a page followed by the body text. Each line of the list begins with a tab, then a word beginning with #str_ within double quotes, then another tab, then a phrase within double quotes. When you translate a line, in the output keep the tabs and the #str_ word unchanged, and only change the phrase to the other target language in UTF-8, keeping it in double quotes. Avoid generating “Unicode combining characters”. Try to keep the character count (including spaces and punctuation) in each translated phrase not too much longer than the original English, while preserving the formal meaning. Specifically, for title text, try to keep character count under 50% longer, and use a title style of word casing. For body text, try to keep the character count under 30% longer. Before, between, or after sentences, one or more “\n” may appear, indicating a line break; retain these in the translation. Avoid modern slang. Old-fashioned and historic nautical wording is fine. At the end of each line, add another tab followed by the fixed text "//uv", indicating unverified. There are two pages. They relate to a fictional voyage, several centuries ago, of a small sailing merchant ship Esmeralda. The first page is a printed boilerplate notice, given by Esmeralda’s Captain Riggs to each seaman hired for a voyage. It is effectively a contract, written with lots of words in the body capitalized to make it seem more official and legal. Parts of the text are designed to be somewhat threatening, as if this hiring was aboard a big government naval vessel, not just a merchant ship. The second page is an informal handwritten note, also by Captain Riggs, with more specific instructions to two hired crew members. On all pages, leave “Riggs” and “Esmerelda” unchanged during translation. Following the input list of pages, append output lists in these target languages: 1. German 2. French 3. Polish 4. Italian 5. Spanish 6. Portuguese 7. Russian 8. Czech 9. Hungarian 10. Dutch 11. Slovak 12. Danish 13. Swedish 14. Romanian 15. Turkish 16. Catalan. List of pages: "#str_fm_airpocket_xd_sheet_appointment_to_service_pg1__title" "Appointment to Service\n" "#str_fm_airpocket_xd_sheet_appointment_to_service_pg1_body" "\n\nHEAR YE ALL, that the Bearer is found Fit to Serve, and appointed to the Rank of Ordinary Seaman, aboard the Merchant Ship Esmerelda, by Authority of the Shipmaster, Captain Riggs.\n\nThe Appointee serves Under the Direction and at the Pleasure of the Captain and Officers. By the Law of the Sea known by all, while Aboard, Violations of Duty will be Judged & Punished by the Captain and Those carrying out his Orders. Thievery, Mutiny, Assault, and Other Gross Breeches of Order will be met by Flogging or Death. At Successful Voyage End, the Appointee upon Discharge will receive the Agreed Wage recorded in the Ship’s Log, less any Deductions.\n" "#str_fm_airpocket_xd_sheet_appointment_to_service_pg2__title" "Further Instructions\n" "#str_fm_airpocket_xd_sheet_appointment_to_service_pg2_body" "\n\n\n\nAs agreed, the 2 new crewmates will bunk in the galley, and share the sea trunk nearest the door in the mess.\n\nCaptain Riggs\n" Output ChatGPT broke up the results into frames by language again, but I just noticed that, beyond the end of the output, there is an overall “copy” button. So I don’t really have to copy each language individually. But still in either case, I’m post-copy editing to put separating headers in the proper form, e.g. [German]. The first time I did this, in late evening, it did the whole gulp very fast. The second time, with a more-complete prompt (shown above), at noon time, the system was evidently being hammered (or maybe just a policy slowdown). It typed out slowly and paused several time, asking me to hit a Continue Generating button. Frames where those pauses occurred got messed up in the overall copied output, so I had to recopy those individual frames, which were OK. Also, ChatGPT applied a different text color highlighting this time. Doesn’t matter. At first glance (and without back-translation verification yet), results looked pretty plausible. Interesting that where the text mentions "Captain Riggs", for most languages the "Captain" was left in English. Next, I’m going to add the new preliminary readables translations to all.lang and generate the *.lang, for the purpose of estimating and testing text scaling.
  3. A Plan for Readables with AI Translations I’ve created a separate forum thread, "Proposal - Fade-In Translations for Readables", for a new style of readable more adaptable to translations. This would require engine changes, but can be faked in the short term. I plan to create custom .gui to do this for Air Pocket. And if the proposal wins favor and moves to timely implementation, to revise Air Pocket to use the new system. I'ill be pausing my posts here for a moment to build out that other thread, then come back to explore the actual AI translations for Air Pocket readables.
  4. Testing Air Pocket with Just English Strings Why? So far, English is the only language with all #str_ done; the rest have only inventory items and shouldered names done. Preliminary testing of the just the English version is mainly because of all the manual renaming. If #str_ were generated automatically with better naming, this could be largely skipped. How? The [English] part of our emerging all.lang was copied out and became the heart of a replacement strings/english.lang. I didn’t have to bother with gen_lang processing. This works because English is restricted here to ASCII and that’s common to both all.lang [UTF-8} and english.lang [Windows-1252]. To speed testing, I added the Master Key at startup (in DR, entity key_master gets inv_map_start 1). The game was then played through in English. This is a short FM, with just a few readables and messages that don’t really change with difficulty level, so this is viable. The briefing has to be eyeballed once, and the objectives list a few times during play, to ensure all is well. Results? As expected, a few typos were introduced when editing #str in .map, .xd, .lang. files, so needed correcting. I have some thoughts about strategies for the upcoming testing in the other languages, as well as in larger FMs, but later for that.
  5. The Problem Readables are available in a wide range of TDM bitmap fonts. Unfortunately, the majority of these fonts lack non-ASCII glyphs for European languages, and it would be a prohibitively lengthy task to craft them. This is one of several translation hurdles. (Another is soliciting, organizing, and distributing the work of human translators; see “AI for Translations: An Exploration” for work on an alternative. This also promotes the use of meaningful alphanumeric #str_ IDs - possibly automatically generated – instead of traditional numeric.) A Proposed Solution Suppose that when a particular page of readable is shown, it is shown first is English, with the mapper-specified font (e.g., Camberic), and then, after a number of seconds, shown in the current user-selected language, with a different font (e.g., Stone), one that offers the needed diacritics? And with the translated font size scaled down to accommodate potentially more-lengthy translated strings? Both the English and translated text can be viewed in sequence. That opens the door to “quick and dirty” default translations, e.g., machine translation. In particular, the reader may sometimes be able to work-around any layout problems and sub-optimal translation by consulting the English text. (Nevertheless, default translations may sometimes miss subtle nuanced hints, so the ability to improve them with tweaked text is a necessity.) The Proposed Mechanics Recall that the game engine currently passes these values to a readable’s gui: gui::title gui::body With a multipage readables, the content of these parameters changes as pages are flipped. For clarity, it is proposed to replace them with: gui::titleEnglish gui::bodyEnglish gui::titleTranslated gui::bodyTranslated The latter 2 would be just like gui::title and gui::body, except that they would serve empty strings when the current language is English or there is no translation available in the current language (and so be used for gui code program logic, to suppress a transition). Observe that this behavior does not substitute an English string for a missing non-English string. Skip the remainder of this section if details are not of interest. Each stock readable .gui would need a one-time conversion to use them. Instead of the traditional 2 winDef overlays for text, there would be 4, corresponding to the 4 text-passing parameters just mentioned. This allows the translated text to fade in while the English text fades out, when an onTime event starts the transition (at 2 seconds in this example). Here is the fragment of .gui code that has been altered: Aspects of this Design – Transition from English The transition is timed, so no extra “Translate” button is shown, nor a hard-to-come-by hot key required. If you want to see the English again, you would briefly navigate away from the page to another, then return; or, if a single-page, close and re-open it. A simple implementation (as the code above and example below) uses a fixed, hard-coded time. Alternative Mechanism. At some cost to code clarity, it is probably possible to get by with just the 2 normal text-passing parameters (gui::title and gui::body) and their traditional 2 overlays, though additional variable(s) would be needed for tight time-synchronization between engine and gui; and overlapping fade-in/fade-out between English and translation would not be possible. Advanced Version. In the longer term, timing could be made more flexible, by passing it as parameter from the engine, e.g.: “gui::transitionTime” Where does this value come from? While it could somehow encoded into the .xd file by the mapper, I prefer a different approach. Have the engine calculate it from character or word count of the body, with user-specified globals for reading rate and min and max bounds, e.g.: sys_readablesWordsPerSecTransTime sys_readablesMinTransTime sys_readablesMaxTransTime A drawback of a timed transition is that additional reading time is needed to get to the translations, which may, with immobile readables, increase risk of discovery by guards. So having these additional user controls would let a user get to the translations faster, even skip the English entirely by setting bounds to zero. A Simulated Example – FM “readableTranslationFadeIn” In the absence of engine support for the 4 text-passing parameters, it is still possible to make an approximately-functional mockup using some hard coding. However, this prototype DOES NOT suppress the transition when the current language is English. That is, it shows (rather than prevents) an English-to-English transition with change of font & font-scale. TDM with the languages set to “Francais” (French). The first screen shot shows page 1 of a 3-page scroll, momentarily displayed in English with Camberic title and body. After a few seconds, it transitions to the second screen shot, in French in Stone font. With accents. While shown here as a scroll, this approach should be easily adaptable to books and sheets. About the Example’s Implementation The screen shots are from a prototype FM: readableTranslationFadeIn Notable files are: guis/readables/scrolls/scroll_calig_camberic.gui, a custom override of the standard Camberic scroll readable, with the translation transition mechanism from above, plus additional simulation fakery described below. strings/all.lang, a UTF-8 file containing 6 #str_ (2 per scroll page – title & body) in each language section. Only the [English] and [French] sections were implemented. The English example content was loosely derived from the St. Lucia FM. The English text (without #str_ structuring) was manually converted to UTF-8 French using Google Translate (website, not API). strings/english.lang & french.lang. These were generated from all.lang using my gen_lang_plus program to create the 8-bit “ANSI” versions as required, e.g., ISO-8859-15 encoding for French. xdata/readableTranslationFadeIn.xd, that contains the #str_IDs for the 3 scroll pages. Within scroll_calig_camberic, this simulation had this fakery: “gui::title” and “gui::body” were stand-ins for hypothetical parameters “gui::titleTranslated” and “gui::bodyTranslated”; The English text was hard-coded, and the appropriate content selected by actual parameter “gui::curPage”, to make up for missing hypothetical parameters “gui::titleEnglish” and “gui::bodyEnglish”. The READABLE_FADE_TIME is currently set to 2 seconds for testing. Probably 5-6 seconds would be better during game play. Aspects of the Design – Font Scaling As mentioned earlier, the translated font is scaled to make the text smaller than the original, to accommodate languages that need more room. A simple implementation (like in the example code) uses fixed values with “textscale”. So the textscale for the two Translated winDef overlays is smaller than for the 2 English winDef overlays. Specifically, in the example GUI code, the text scaling factors from the original Camberic readable were retained: textscale 0.4 // titleEnglish textscale 0.31 // bodyEnglish and supplemented by (with a different font, namely Stone): textscale 0.33 // titleTranslated textscale 0.24 // bodyTranslated The goal is to keep the rendered text smaller than the original English rendering for languages with more characters per sentence. These values, while hard-coded, will differ across readables (due to different starting fonts), and would need to be experimentally determined. But this treatment, with just a fixed scaling value that is independent of both text content and current language, is unlikely to be very satisfactory. Better ideas, needing additional engine modifications, will be considered in a follow-on post. Additional Considerations When Authoring the XD File. Recall that TDM is relatively inflexible when using #str_ within an .xd file. So this form will not work: "page1_body" : { "" "" "#str_fm_scroll_camberic_pg1_body_parish_inspection_excerpts" } Instead use "page1_body" : "#str_fm_scroll_camberic_pg1_body_parish_inspection_excerpts" With the 2 leading linebreaks moved into the #str content as leading \n\n. When Testing. If there is a mismatch between the TDM Language setting and the PC’s language setting (e.g., under Windows), then some characters may turn out wrong or indicated as missing (e.g., as boxes). The degree will vary by language, and is unlikely to be seen in the initial English render (because that’s almost all in ASCII, common to all the ISO encodings.) Even with such mismatches, the translation can be reviewed as to overall length and where linebreaks occur. Be aware that direct editing of *.lang files is not recommended, and could risk converting from a particular “ANSI” raw 8-bit encoding into “UTF-8”. Applying this Technique More Broadly. A few fonts have oddball glyphs for certain characters, e.g., a skull and crossbones in Treasure Map. This would require special handling during translation. For Briefings, Objectives, and Messages, similar approaches can be conceived. However, for each of these (and different from readables), only one particular font is routinely offered. And there are alternative designs to be considered. For instance, the English and Translated text could be shown simultaneously side-by-side in various ways, instead of sequentially. The Objectives have the additional complication that the font size is already user-adjustable.
      • 3
      • Like
  6. Yet More #str_ Added – Shouldered Names The I18N.pl script does not automatically localize characters’ “name”, “shouldered_name” and “shouldered_name_dead”. And that’s generally the right call. You don’t necessarily want to translate character names. For instance, I prefer my character “Emily” be left that way, i.e., not rendered as for instance “Émilie” in French. (And I’m unclear whether the “name” spawnarg should ever get the #str treatment. So really just looking at “shouldered_name” and “shouldered_name_dead”.) But I do have 3 characters that have not just name, but also title/rank, and it might be worthwhile to translate the title/rank, along the lines: "#str_fm_map_shouldered_name_capt_riggs" "Capt Riggs" "#str_fm_map_shouldered_name_first_mate_logah" "First Mate Logah" "#str_fm_map_shouldered_name_second_mate_chaf" "Second Mate Chaf" By story design, only Logah is actually shoulderable, but for completeness I’ll do all three. I’m using the same #str_ for both “shouldered_name” and “shouldered_name_dead”. BTW, an argument could be made that the character #str_ renames should include the class, e.g., ..._atdm_ai_townsfolk_wench_..., ..._atdm_env_ragdoll_guard_thug_, etc. However, a class name, while explaining the visual appearance, can be somewhat remote from the precise role that the AI plays in the FM. Prompt to ChatGPT With Air Pocket specifics in italics. You are an expert translator between English and other European languages, including Russian. You wish to translate a list of crew members on a small historic sailing ship, from English to these other languages. Each line of the list begins with a tab, then a word beginning with #str_ within double quotes, then another tab, then a phrase within double quotes. When you translate a line, in the output keep the tabs and the #str_ word unchanged, and only change the phrase to the other target language in UTF-8, keeping it in double quotes. Make the translated phrase reasonably short while preserving the formal meaning. Avoid modern slang. Old-fashioned wording is fine. At the end of each line, add another tab, the fixed text "//bt: ", and then a back-translation of the previously translated phrase into English again. When back-translating, ignore the original English phrase. The crew members names are Riggs, Logah, and Chaf, all males. Keep the name-portion unchanged, but translate the title-portion. Some centuries ago, they were on a small coastal sailing ship, trading in goods along the coast. In that time frame, take account of different countries having different merchant marine titles aka ranks. "Capt" here is an informal title for "Captain". These titles may be less formal than those of military officers aboard navy ships. Following the input list of crew members, append output lists in these target languages: 1. German 2. French 3. Polish 4. Italian 5. Spanish 6. Portuguese 7. Russian 8. Czech 9. Hungarian 10. Dutch 11. Slovak 12. Danish 13. Swedish 14. Romanian 15. Turkish 16. Catalan. List of crew members: "#str_fm_map_shouldered_name_capt_riggs" "Capt Riggs" "#str_fm_map_shouldered_name_first_mate_logah" "First Mate Logah" "#str_fm_map_shouldered_name_second_mate_chaf" "Second Mate Chaf" Results This time, ChatGPT did not place results in separate frames by language; because of fewer input strings? Also, between the title and name in every case was some odd “hollow box” character (possibly null character 0x00); I did a replace-all of those with <space>. What follows is the start of ChatGPT’s response (with my hollow-box correction). Here are the translations for each language. I’ve preserved tabs and identifiers exactly, translated the title portion into the historical/formal equivalent, and added brief back-translations (“//bt: …”) into English. 1. German "#str_fm_map_shouldered_name_capt_riggs" "Kpt Riggs" //bt: Capt Riggs "#str_fm_map_shouldered_name_first_mate_logah" "Erster Maat Logah" //bt: First Mate Logah "#str_fm_map_shouldered_name_second_mate_chaf" "Zweiter Maat Chaf" //bt: Second Mate Chaf 2. French "#str_fm_map_shouldered_name_capt_riggs" "Capt Riggs" //bt: Capt Riggs "#str_fm_map_shouldered_name_first_mate_logah" "Second Capitaine Logah" //bt: First Mate Logah "#str_fm_map_shouldered_name_second_mate_chaf" "Tiers Capitaine Chaf" //bt: Second Mate Chaf 3. Polish "#str_fm_map_shouldered_name_capt_riggs" "Kapn Riggs" //bt: Capt Riggs "#str_fm_map_shouldered_name_first_mate_logah" "Pierwszy oficer Logah" //bt: First Mate Logah "#str_fm_map_shouldered_name_second_mate_chaf" "Drugi oficer Chaf" //bt: Second Mate Chaf 4. Italian "#str_fm_map_shouldered_name_capt_riggs" "Capitano Riggs" //bt: Capt Riggs "#str_fm_map_shouldered_name_first_mate_logah" "Primo ufficiale Logah" //bt: First Mate Logah "#str_fm_map_shouldered_name_second_mate_chaf" "Secondo ufficiale Chaf" //bt: Second Mate Chaf ... etc. Other Observations As you can see above, ChatGPT neglected to provide a leading <tab> to each line, so I had to. I had forgotten to think about asking for “title case” in the prompt. Maybe just as well, since the 2 meanings of “title” (rank and capitalization) might have confused the AI. In the results, the person’s name was always left capitalized, and the first word. In a few languages, the second word was also capitalized; in most, it was not. I wasn’t sure if that’s how ranks are usually treated in various countries. I decided (unlike the inventory case) not to adjust the AI-provided casing. Finally, ChatGPT back-translated “Capt” always to “Capt”, never to “Captain”, even if the foreign language gave the full word, e.g., Capitaine. Probably evidence of its cheating here. I’m getting less confident in back-translation as a way to insure quality... at least when done by the same AI that does forward translation, and at the same time. Or maybe it’s just a ChatGPT shortcoming, and other AIs would do better.
  7. Inventory Item Results The AI results, generated in a flash, were successful overall. The desired output format was achieved, and the back translations detected very few problems. An example result (before alphabetic ordering), from [Italian]: "#str_fm_map_inv_key_sea_trunk" "Baule da mare" Sea chest "#str_fm_map_inv_key_galley" "Cucina di bordo" Ship’s galley "#str_fm_map_inv_key_mess" "Refettorio" Mess "#str_fm_map_inv_key_master_key" "Chìave maestra" Master key ... Caveats: A judgement of “overall success” is without ground-truthing by human translators. The back-translation uses the same AI as the forward-translation... loosely analogous to “grading its own paper”. Though told not to look at the original English when back-translating, is it cheating and lying? Revisions and Refinements Title Casing for Inventory Items. I should have explicitly prompted to generate this in the results. I manually corrected these (leaving separators like di, de, do, del in lower case), but a few cases required ChatGPT consultation to get the proper upper-case Unicode letter. Mess. Two translation needed revision, because they seemed to suggest an eating area of a much larger or luxurious ship: [German] "#str_fm_map_inv_key_mess" "Offiziersmesse" Officers’ mess [French] "#str_fm_map_inv_key_mess" "Salle à manger" Dining room Rather than rerun the whole translation, I just asked ChatGPT about these cases: On a small ship, what is the mess or eating area called in French? On a small ship, the eating area is typically called la cambuse or le carré, depending on the context: Le carré – This is the standard French nautical term for the crew's or officers’ mess (eating/living area) on a ship. It’s equivalent to the English term mess. Carré des officiers – officers' mess Carré de l’équipage – crew mess La cambuse – Often refers to the ship’s food storage or sometimes the galley (kitchen), but in some informal or small-vessel contexts, it can be used more broadly, though this is less precise for "mess." So for a small ship’s mess or dining area, the best and most nautical French term is le carré. Consequently, I just replaced "Salle à manger" with “Carré”. Likewise, in German, simply “Messe”. Master Key. This was back-translated as “Main key” in 6 of the languages, which I guess is OK. In Air Pocket, a Master Key (entity name key_master) was left in the blue room, if needed for a debugging build; inaccessible otherwise. Arguably, it should not be translated at all. So I’ll comment those non-English lines out. (“Master key” is not part of TDM’s base strings?) Captain’s – Odd Punctuation and Perhaps Compound Unicode. The German translation of “Captain’s Cabin” had odd punctuation: "Kapitan­s¬kajüte" (and some evidence of a Unicode “combining character”). I did a separate follow up to ChatGPT, and revised to drop the punctuation and add an umlaut over the 'a': Kapitänskajüte. Another German use of “Captain’s” was similarly revised. German reportedly never uses apostrophe for possessive form. Captain’s – Title versus Name. There was one case in [Danish] where the word “Captain’s” was not translated, as if it was a person’s name. (Also, reportedly, Danish does not generally use apostrophes for possessives; there are exceptions, but doesn’t seem to apply here.) State of All.Lang So Far Starting from a temporary file into which I pasted the raw AI results (with [<language>] headers added), I fabricated all.lang by: Making sure it had Unix line ending, not CRLF. (In Notepad++, Edit/EOL Conversion/Unix). Begin it with a first-draft preamble comment, heavily adapted from TDM’s all.lang preamble. Following that, a line with just an opening bracket. And a closing bracket line at end of file. Making the handful of translation corrections mentioned above. Change the casing to Title Case. (I didn’t bother changing the back-translation’s case.) Tagging the back-translations with “//bt:”, so they are denoted and if need be can be quickly stripped out with an editor. (If subsequent revision is manually applied, the delimiter will also be altered; preamble will provide guidance.) Lessons Learned So Far Improvements to Prompting... Specify that the FM’s ship is small. Specify that “Captain” is a title, not a person’s name. (Hmm, there’s some shouldered names, not touched by I18N.pl, that maybe should be partially-translated too, with titles like “First Mate Logan”.) For inventory items (and likely readables titles), ask the AI to make the output in Title Case. Tell the AI not to generate Unicode combining characters. Ask the AI to add a special delimiter “ //bt: “ before the back-translation. To the extent possible, convert any directional punctuation (apostrophes, single quotes, double quotes) to non-directional, to comply with TDM font limitations. Since it seems to give better results if you ask about one specific item (like “mess” in French), maybe it’s optimizing for speed instead of accuracy. Ask it to take more time? ChatGPT translation seems to have problems with possessive forms... or at least those problems are more-easily spotted during review. Speculation: maybe one cause of this is that I didn’t specify which country or regional dialect of a language to use. Perhaps a prompt to “prefer the form of language spoken in a language’s originating country, within or adjoining Europe.” Concerns about Translation Length... The results are generally short, but in-game will some of them prove to be too long? Traditionally, inventory names are limited to 2 lines, with “\n” needing to be inserted. This will need to be tested eventually.
  8. Desired Results File with All Languages What is want to end up with from our AI (with any iterative fixup and manual integration) is an FM-only version of TDM’s UTF-8 all.lang. That file has 17 language sections in a particular order, which we will adopt too (in the prompt further below), although, other than English being first, it doesn’t really matter. Once we have our FM all.lang, we can easily generate all the required ISO-encoded *.lang files, e.g., french.lang. Strategy of Feeding the AI One approach would be to just feed the entire #str list in one gulp, with prompt engineering that covers all aspects. This would minimize the post-translate integration time. But the concern is that prompt engineering becomes more difficult. The AI might get confused about what restrictions and hints apply to which strings. While sometimes shared context across strings can be helpful, too much shared context could lead to overly-creative translations (e.g., hallucinations). [BTW, if accessing the AI through an API, there’s often a "temperature" value you can specify, from 0.0 to 1.0, from most-predictable to most-creative. We can use a few words in the prompt to approximately achieve similar ends.] So, to maintain more control, I’m going to batch-feed. The assumption is that translating #str_ in batches of related input groups will allow more focused guidance from prompt engineering, leading to better results. I’ll start with inventory items, that have the shortest strings and most dictionary-like lookup. An alternative/additional batching (particularly needed with large FMs) would be by "scene". In the case of Air Pocket, it could be thought of broadly as 4 scenes, based on timeline and location. The story, as driven by objectives, is fairly linear; larger FMs would typically have some randomization in scene order. Would batching by scene be useful (i.e., give better AI results) for some of Air Pocket’s #str_ s? Thinking this over. But for now, treat inventory items independent of scene. Prompt Engineering for Inventory Items A stab at a reusable template follows in blue. It describes the overall translation task, the desired tone, and input and output formats. Text specific to inventory items is shown in bold. Text that is specific to this FM, to clarify the context and the meaning of particular words, is in italics, with spoilers hidden. You are an expert translator between English and other European languages, including Russian. You wish to translate a list of inventory items, all inanimate objects, from English to these other languages. Each line of the list begins with a tab, then a word beginning with #str_ within double quotes, then another tab, then a phrase within double quotes. When you translate a line, in the output keep the tabs and the #str_ word unchanged, and only change the phrase to the other target language in UTF-8, keeping it in double quotes. Make the translated phrase reasonably short while preserving the formal meaning. Avoid modern slang. Old-fashioned wording is fine. At the end of each line, add another tab, and then add a back-translation of the previous phrase into English again. When back-translating, ignore the original English phrase. Most of the inventory items are keys, and the associated phrases describe locked doors to particular locations aboard a ship, or locked trunks or safes on a ship. The "Master Key" opens all locks. Following the input list of inventory items, append output lists in these target languages: 1. German 2. French 3. Polish 4. Italian 5. Spanish 6. Portuguese 7. Russian 8. Czech 9. Hungarian 10. Dutch 11. Slovak 12. Danish 13. Swedish 14. Romanian 15. Turkish 16. Catalan. List of inventory items: [... skipping 1 potential spoiler] "#str_fm_map_inv_key_captains_cabin" "Captain's Cabin" "#str_fm_map_inv_key_captains_safe" "Captain's Safe" "#str_fm_map_inv_key_galley" "Galley" "#str_fm_map_inv_key_master_key" "Master Key" "#str_fm_map_inv_key_mess" "Mess" "#str_fm_map_inv_key_sea_trunk" "Sea Trunk" Using ChatGPT As discussed at the outset, you can use this without signing in (it will nag you). Also, if you’d like it not to retain your input for training purposes, click on the circled question mark and change it under "Settings". As of this post, of you ask ChatGPT what model it’s using, it responds "You're currently chatting with GPT-4o, the latest model from OpenAI as of 2025. The "o" stands for "omni" — it's designed to handle text, images, and more, all in one model." Following up by enquiring about usage limitations, it says "Free users can access GPT‑4o, but with strict usage caps, which vary based on demand and time of day". More specifically, "usage falls in the range of 5–16 messages per 3–5 hours, after which you'll be limited or switched to GPT‑4o‑mini." The latter is a faster but lower-accuracy model. "We’ll notify you once you’ve reached the limit and invite you to continue your conversation using GPT-4o mini or to upgrade to [paid] ChatGPT Plus." Because I’m doing this at a leisurely rate (and reporting it to you in posts), the usage restrictions should not bite. About Input and Output Formats As you can see above, the input is the AI prompt, appended with content from english.lang, namely, the lines between the "{" and "}" brackets. For those lines, no change to tab-separation is done. The output is the same format, but with an added English back-translation added to each line. When ChatGPT generates the response, each requested language is enclosed in its own HTML response frame, with a separate "copy" link. So you have to copy each link separately, pasting them successively into your FM-specific all.lang file while adding headers, e.g., [French]. Also, the frame margin contains the word "vbnet". When I asked, ChatGPT indicates that’s the style of syntax highlighting applied to the results, based on source material, but it may be inappropriate and so ignorable. Which explains why the word "key" was always colored green. In the next post, I’ll discuss the specific results.
  9. Could be good, could be a cockup. Something to explore for the future. For now, moving on to the translations.
  10. Renaming the Strings What I describe here is a bit of a fib, reflecting where I ended up, not the iterative process. And some renaming was refined after the first trial AI run. Before doing the translation, I’m going to rename all the FM-specific strings. Why? It’s fair to say the #str_ system was never a hit with DR developers, so features that would support it are scant. A way to make it less painful for mappers to inspect a post-conversion FM in DR is to change the #str_<5-numbers> to #str_<meaningful string>. By convention, <meaningful string> is limited to ASCII alphanumeric characters plus underscore, with no spaces. This generally does not include a version of the full English string (and there are length limitations), but something that clues the mapper. And groups things helpfully. More specifically, I’m going to rename all the FM-specific strings from #str_2xxxx to: #str_fm_<file_source><grouping_and_ordering>_<unique hint> The <grouping_and_ordering> substrings are chosen with an eye to both viewing in DR and group translation. (Later, I’ll talk about batch-processing strategies. The group-naming here does not use a “by scene” strategy.) So this is done first in english.lang and then the altered assets (in .map, .xd). As I search for “#str_” in assets, I am aware that some strings are TDM-level defined, so I should not rename them. These have stringID numbers under 20000, or in theory beginning with #str_main_menu. There were 2 of these found in airpocket: items with inv_name or inv_category given by #str_10052 and #str_02381. Specific Renamings and Examples I’ll be truncating example strings here for brevity and to reduce spoilers. The categories (and an example or two of each) follow. Darkmod.txt and Readme.txt. As mentioned, TDM doesn’t really handle translations of these. Just for completeness... "#str_20000" ==> "#str_fm_darkmod_txt__title" "Away 1: Air Pocket" "#str_20001" ==> "#str_fm_darkmod_txt_desc" "On the run from her husband, me and my girl. With a bribe to a ship's captain, we're away. What could go wrong now? Oh, dammit." Notice that here (and elsewhere) I’ve added an extra “_” before “title”. This is so, when sorted alphabetically, the title comes first before the description (in this case) or body. Mission Briefing. This tells a story, so be sure that its titles (if any), bodies, and pages are well-order to present to the AI. Since the briefing text is long and complex, I’ve opted not to include a #str <hint> for it. "#str_20026" ==> "#str_fm_mission_briefing_xd_pg1_body" "Life is sweet! It's been 3 days since Emily ran away with me, leaving behind her bastard husband. He's miles away now, as our tradeship Esmeralda plies herself down the coast.\n [...] \nAnyway, Emily's set up a cozy bunk for us in the bow galley.\n" Readables. Like the briefing, a readable often tells a story and should be well-ordered to present to the AI. I’ve chosen to skip hints for these. "#str_20024" ==> "#str_fm_airpocket_xd_sheet_appointment_to_service_pg2__title" "Further Instructions\n" "#str_20025" ==> "#str_fm_airpocket_xd_sheet_appointment_to_service_pg2_body" "\n\n\n\nAs agreed, the 2 new crewmates will bunk in the galley, and share the sea trunk nearest the door in the mess.\n\nCaptain Riggs\n" Inventory (mainly keys). The hints for these are usually just the full text, in lower case with punctuation dropped. They could include info about item class or difficulty, but I didn’t find it necessary here. "#str_20021" ==> "#str_fm_map_inv_key_captains_cabin" "Captain's Cabin" (I’ve went with “...fm_map...” here instead of “...fm_airpocket_map...”. I hope that works out. Too bad “map” itself has multiple TDM meanings, e.g., “hungarian.map”; in-game map. Also, since the order in which these inventory items are discovered in the game is pretty variable, I decided not to embed a numeral to force ordering.) Objectives. The objectives as coded in the .map file (and so generated english.lang #str_ numbers) are generally not listed in objective number order. It’s helpful to so reorder the rows. Also, because there are more than 9 objectives, I added a leading “0” to those with a single digit, for lexical ordering. The “obj<num>” is the DR/.map-internal objective number, which loosely corresponds to presentation to the game player, although objectives come and go. "#str_20008" ==> "#str_fm_map_obj01_no_hurt_or_loot_crew" "Rile neither the captain nor crew. No assaults by me or gratuitous thieving... not that there's much to steal." Thought messages. The I18N.pl script did not catch these, so they are manually added (and there’s no #str_2xxxx to begin with). I included an ordering digit after “msg”, roughly reflecting game occurrence (though difficulty and player actions will affect which of these appear and when): "fm_map_thought_msg1_sword_stolen" "My sword's been pilfered... I'm not surprised. It's likely still on-board somewhere." Ordering the Strings The #str_ lines in “english.lang” are extracted and put into ASCII alphanumeric order. I used Notepad++ with Edit/Line Operations/Sort Lines Lexigraphically Ascending. Any full-line grouping comments (beginning with //) will need to be manually repositioned after this. Programmatic Ideas to Better Support Renaming This would be helpful: A program that took the original 5-digit #str form of english.lang, and the renamed-#str english.lang form, and did the surgery on <fm>.map and *.xd . Or alternatively you could make a version of english.lang with an extra tab-separated field; this is what I created manually as a reference for myself, with lines like: "#str_20003" "#str_fm_map_inv_key_sea_trunk" "Sea Trunk" Then have a program that took that and did the surgery, using the first 2 fields. Or most ambitious of all, a version of I18N.pl that created alphanumeric #str_ in the first place, instead of 5-digit ones. Using something like the naming scheme above, this is mostly straightforward. However, for strings that are long sentences, if you still wanted to automatically summarize them into a few words for the #str <hint>, you’d probably need to call upon AI. Next Up I’ll make a version of this data available a little later on. Next post: prompt engineering!
  11. I know sometimes in the past I've misconfigured DR's File/"Game/Project Setting..."/ and gotten mystery problems similar to what you describe.
  12. Starting the Conversion with I18N.pl Folks have found that it is best to do the conversion only when an FM is complete, which is certainly the case here. As a prerequisite, I installed and tested Strawberry Perl on my Win11 dev box. It worked this time, unlike in the distant past on a different, Win10 machine, where I tried and failed for 3 days to get it to work. Then, as specified in our wiki’s I18N page, I set up a directory with the airpocket.pk4, and a copy of TDM 2.13 strings/english.lang. (The wiki specification missed that last step, which I corrected.) Running the I18N.pl script generated the expected results in its “output” folder: an altered airpocket.pk4 and new airpocket_l10n.pk4. Within both, a /strings/ folder was now present with just an english.lang file, with 28 #str_ entries, covering: maps/airpocket.map xdata/airpocket.xd xdata/mission_briefing.xd darkmod.txt readme.txt However, I believe #str_ support for the last 2 was never implemented (and the output version of those files don’t include #str_ s). So this will be lowest priority for me. Also, I don’t see any auto-generated #str_ for the 5 player “thought messages”. Probably need to add these. English text for these is in the .map file (found by searching for tdm_message_no_art).
  13. Scope of Work TDM currently supports a partial localization with a moderate amount of work. With a heroic effort, and a separate build on a separate site like DarkFate.org for Russian, one particular language can be fully supported. I'm not striving for that. Instead, I'm looking at AI-enabled moderate-effort partial localization across ALL TDM-supported languages for a particular FM. Choosing a Test FM As a test case, I'm going to report on internationalizing my small FM "Away 1: Air Pocket" (install name is airpocket). This FM is an "internationalization virgin". Furthermore, here's what it DOESN'T have: Custom video or speech audio Custom inventory items or weapons Custom images with embedded language text Signage Mission title & credits on splash screen (as either text overlay or embedded) Instead, these are my work targets, to translate to ALL languages: Text briefing Standard inventory items, mainly keys, but with custom names. Objectives A few sheet readables Some "thought messages" from the player (in the form of white text in midscreen, done with atdm:gui_message with atdm_gui_message_no_art.gui)
  14. With prompt engineering. Be starting to detail that soon. For now, here's my next post...
×
×
  • Create New...