Jump to content
The Dark Mod Forums

Analysis of 2.12 TDM Fonts


Geep

Recommended Posts

22 hours ago, Geep said:

But, in any event, they do need to be backed up somewhere official.

I did back up this files to the official HiRes textures SVN, so we are good to go on that front!

  • Thanks 1
Link to comment
Share on other sites

Stone 24pt Font Upgrade, May 9 2024 Interim Release

This first release just clears away some of the low-hanging fruit, while buffing the new datBounds program.
In summary -

  • stray mark next to W removed
  • poor spacing of J and AE improved
  • 6 characters got accents added.

The download is made up of:

  • .dat and 2 .dds files, for distribution/game inclusion
  • .ref [with changes annotated] and .xcf [master source GIMP project, including datBounds layers]

2024 May 9 Stone 24 Interim font update.zip

The changes were also doubled-checked using the testSubtitlesANSI FM.

As this work continues, with an interim release approximately monthly, characters to be fixed will be chosen seemingly "at random", but actually due to layout considerations... you don't want to know. They'll all be done eventually.

Details of Improvements:

Spoiler


Tweaks to DAT Alone
===================
•  87 (0x57) // W    Fixed stray mark to left. +1 for s_coord, s2_coord
• 173 (0xad) // Soft hyphen (SHY). Was shown as hollow box. Now shown as zero box.
             // (A soft hyphen is a marker within a word, to indicate where it may be split for wordwrap,
             // at which point a hyphen is added by the wrapping software.)
• 198 (0xc6) // AE Ligature, a character probably not used yet.
             // This had an overly-large bounding box overlapped adjacent character. Spacing was poor too.
             // -3 for s2, imageWidth, xSkip, +3 pitch.

Bitmap Shifts & DAT Tweaks
==========================
• 74 (0x4a) // J    Fixed bad spacing. The J glyph was bitmap-shifted +2. DAT: +2 for s2_coord, imageWidth

Accents Added to Bitmap*
=======================
209 (0xd1) // Ñ Add tilde.
137 (0x89) // Ŕ Add accent acute.
178 (0xb2) // Ť Add caron.
135 (0x87) // Ẑ Add circumflex.
155 (0x9b) // ă Add breve [cup]**.
151 (0x97) // ẑ    Add circumflex.

* These were prioritized because they were mostly done back in 2014. Base character and tall bounding box for accent already present.

** A rare TDM char with breve. Fabricated from bottom half of an o in % symbol.

 

  • Like 2
Link to comment
Share on other sites

Certainly I intend the Stone 24 end product to be included in 2.13. As for the Interim(s), I'm posting them as "hit by a bus" insurance. It also gives anyone interested a chance to kick their tires in testing FMs. Or include them in 2.13 betas. Or in your Unofficial Patch. Enjoy.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Stone 24pt Font Upgrade, May 20, 2024 Interim Release

This second release is mainly concerned with providing accented upper-case characters, particularly in populating two rows of bitmap _1_.

In summary -

  • 31 capital letters now have proper accents.
  • 5 other characters got minor improvements.

As with the first release, this download has been double-checked with testSubtitlesANSI FM and is made up of -

  • .dat and 2 .dds files, for distribution/game inclusion
  • .ref [with changes annotated] and .xcf [master source GIMP project, including datBounds layers]

2024 May 20 Stone 24 Interim font update.zip

Details of Improvements:

Spoiler


Tweaks to DAT Alone
===================
- š (0xa8) upper bounds raised to fix caron clipping.
- Inconsistent metrics for "<" (0x3e) and ">" (0x3c) fixed.
- "T" (0x54) and Ť (0xb2) were clipped on right side. Fixed.

Bitmap Deletions
================
Newly redundant circuflex-z glyph deleted, _1_ row 3

New Accented Capital Letters Added to Bitmap _1_
=================================================
Ń (0x8c)  row 3
Ă (0x8b)  row 6
À (0xc0)  row 6
Č (0xac)  row 6
Ç (0xc7)  row 6
Ð (0xd0)  row 6
Ę (0xab)  row 6
È (0xc8)  row 6
Ê (0xca)  row 6
Ë (0xcb)  row 6
Î (0xce)  row 6
Ï (0xcf)  row 6
Ő (0xb0)  row 7
Ò (0xd2)  row 7
Ô (0xd4)  row 7
Õ (0xd5)  row 7
Ș (0x8d)  row 7
Ț (0x8e)  row 7
Ű (0xa2)  row 7
Ù (0xd9)  row 7
Û (0xdb)  row 7
Ý (0xdd)  row 7
 
Accents Added to Existing Capital Letters, Bitmap _0_ *
=======================================================
Ą (0xaa)  row 3
Ĉ {0x86)  row 4
Ì (0xcc)  row 5, leftside
Ď (0xb3)  row 5
Ô (0x88)  row 5
Ŝ (0x85)  row 5
Ǔ (0x8a)  row 5
Ě (0xa5)  row 6
Ÿ (0xbe)  row 7

* These were base capital letters, like "E", with its bounding box,
that previously had an additional overlapping tall bounding box that was
shared by multiple planned accented letter, e.g., Ě, Ę, È, Ê, Ë.
These unimplemented letters were "parked" on E as a workaround.
They are now implemented... Ę, È, Ê, Ë now have their own separate characters
in bitmap _1_, leaving only one letter, Ě, overlapping E.

Foregoing In CodePoint Order
============================
Useful to have this list during testSubtitleANSI FM inspection. New unless indicated:

> (0x3c) improved
< (0x3e) improved
T (0x54) improved
Ŝ (0x85)
Ĉ {0x86)
Ô (0x88)
Ǔ (0x8a)
Ă (0x8b)
Ń (0x8c)
Ș (0x8d)
Ț (0x8e)
Ű (0xa2)
Ě (0xa5)
š (0xa8) improved
Ą (0xaa)
Ę (0xab)
Č (0xac)
Ő (0xb0)
Ť (0xb2) improved
Ď (0xb3)
Ÿ (0xbe)
À (0xc0)
Ç (0xc7)
È (0xc8)
Ê (0xca)
Ë (0xcb)
Ì (0xcc)
Î (0xce)
Ï (0xcf)
Ð (0xd0)
Ò (0xd2)
Ô (0xd4)
Õ (0xd5)
Ù (0xd9)
Û (0xdb)
Ý (0xdd)

This is likely the last interim release. The remaining work, leading to final release, involves a dozen more difficult cases, chiefly lower case, symbols, or Icelandic. Plus general spot cleanups.

 

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

  • 2 weeks later...
Posted (edited)

I noticed something during the Stone 24 work. There is an opportunity to add 2 new characters to the custom TDM codepage. I've identified some reasonable candidates...

Paired upper/lower case:

  • Ğ/ğ, with breve (cup)
  • Ľ/ľ, with abbreviated caron shown as stroke or apostrophe
  • İ/ı, namely I with dot above / i without dot

and individual candidates:

  • ẞ, the relatively new capital form of German "sharp S"
  • ¡, the Spanish inverted exclamation mark
  • đ, the Barred d EDIT: No, this is already defined at 0x90

I'd like to hear feedback about what characters to use. @Petike the Taffer, your experience with 2019 translations is particularly germane.

Details about these candidates, and how the change might be rolled out, follow.

Spoiler

Background
The TDM char set has a redundant treatment of 2 characters:

  • Ô appears at 0x88 and 0xD4
  • ô appears at 0x98 and 0xF4

These characters are assigned the 0xD4 and 0xF4 codepoints by the ISO-8859-1 standard, as well as many of the 8859-x variants.

I was not able to find any codepage standard (e.g. ISO, Windows, etc.) that put these characters at 0x88 and 0x98. I think it is TDM only, and likely a mistake. (If you know of a reason, please chime in.)

Among TDM's language-specific codepoint maps (e.g. \tdm_base01\strings\czech.map), these transformations appear:

source    target
=====   =====
...
0x88    0xD4    // Ô
...
0x98    0xF4    // ô
...

These appear in czech.map, hungarian.map, polish.map, and slovak.map

As I understand it, the overall purpose of these 4 .map files is to take characters encoded with 8859-2 codepoints (used by these 4 languages) and redirect them to TDM codepoints. However, in the case of 0x88 and 0x98, these are NOT 8859-2 codepoints; so presumably these <lang>.map entries are just papering over the redundancy.

Possible Replacement Characters
It would be good to reassign 0x88 and 0x98 to two characters, not yet present in TDM's codepage, that might actually be useful. (Note - as with all custom mappings involving 8859-x, translators will have to enter & see weird characters to get what they need.)

TDM's codepage has European accented capital letters in the 0x8- range and the corresponding lower case letters in the 0x9- range. Keeping with that theme for the moment, some candidate pairs are:

  • G/g with breve (cup)    Turkish/Tatar/Azerbaijani (0xD0, 0xF0 of 8859-9; of 0xAB, 0xBB of 8859-3)
  • L/l with caron        The caron in these cases appears as a vertical stroke, not a v. Used in Slovak and some forms of Ukrainian.In 8859-2, Ľ is 0xA5, ľ is 0xB5.
  • I with dot above / i without dot    Turkish. (0xDD, 0xFD in 8859-9; 0xA9, 0x B9 in 8859-3)

Alternatively, individual character candidates include:

  • ẞ, the capital form of German "Eszett" aka "sharp S" (U+1E9E). Relatively new, so not in 8859-x. Lower case is already present in TDM. German accepts "SS" as an alternative for this.
  • ¡, the Spanish inverted exclamation mark (0xA1 in 8859-1)
  • đ, the Barred d, used in Serbian and/or Croatian, Macedonian. In 8859-2, this is codepoint 0xF0. [TDM has Icelandic ð, lower case eth, which can somewhat substitute]. EDIT: No, this already defined at 0x90.

Ideally, the choice of characters could be informed by those most useful to TDM, with priority to Main Menu system, standard Inventory & Weapon names, and translations within the bundled FMs "Training Mission", "A New Job and "Tears of St. Lucia". Easier said than done.

Possible Rollouts for 2.13+
Whatever the replacements, I can easily add them to the wiki char map page, and to the Stone 24pt bitmap and DAT currently being extended. But all fonts need to be addressed.

For ease of discussion, let's assume that G/g with breve are the replacements.

Then, one interim shortcut would be to change czech.map, hungarian.map, polish.map, and slovak.map to point from the 8859-2 codepage to the base letters WITHOUT the caron:

source    target
=====   =====
...
0x88    0xD4    // Ğ --> G
...
0x98    0xF4    // ğ --> g
...

However, preferable would be removed the .map-level items entirely, and instead edit the .DAT files of all of the fonts, so that each "parked" on a "temporary" substitute character:

DAT block    glyph target
=======    ============
...
0x88 // Ğ    G (same location as 0xD4, but taller bounds, awaiting future breve)
...
0x98 // ğ    g (same location as 0xF4, but taller bounds, awaiting future breve)
...

In any event, translation strings that might actually use the new characters (or their parked substitutes) would have to be edited to do so. So, if we added "¡", then the spanish aspect of the .lang system should be adjusted.

And of course, in the real long term, one could dream of editing bitmaps to really support these and other missing characters.

 

 

Edited by Geep
Shouldn't have included barred d as candidate
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

As detailed above (see hidden content), 2 characters may be chosen to replace duplicate characters in the existing TDM codepoint map. Since I didn't hear any feedback here, I have chosen:

Ğ/ğ, with breve (cup), where

  • Ğ uses TDM codepoint 0x88
  • ğ uses TDM codepoint 0x98

Why these? Because they are reasonable language-wise, and easy to bitmap-draw.

At the moment, I will be doing Stone 24pt glyphs and DAT entries for these characters. I intend to add a bugtracker entry (for assignment to me) for additional work needed to fully support this new mapping

  • Like 1
Link to comment
Share on other sites

Posted (edited)

They are reportedly part of Turkish/Tatar (e.g., Crimean)/Azerbaijani. Turkish is one of the 17 languages seen in TDM Settings.

They are part of the ISO 8859-3 standard (at 0xAB, 0xBB) and ISO 8859-9 (at 0xD0, 0xF0).

EDIT: @wesp5, let me answer more broadly. Like most hobbyists, I enjoy a certain amount of magical thinking. In this case, I'd like to think that improvements to the translation system might grow the potential TDM player base. But as to which improvements... we have no focus groups.

Of course, this particular improvement - choosing 2 additional characters - is minuscule. (After I finish the Stone 24pt font work, I hope to formulate and float some more impactful ideas.)

EDIT2: https://bugs.thedarkmod.com/view.php?id=6543 lists additional work needed to support 2 new characters for TDM 2.13

Edited by Geep
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
Posted (edited)

Stone 24pt Font Upgrade, June 20, 2024 Final Release

With this third and (I hope) final update, Stone 24pt becomes TDM's only font+size that fully provides all 256 characters defined in the TDM custom character map.

This last update concludes adding the more difficult characters, chiefly symbols, Icelandic, and remaining accented letters. Plus general overall tweaks to improve character rendering and spacing (as evaluated with testSubtitlesANSI), while preserving important metrics, e.g., xSkip of ASCII alphanumerics. Unwanted overlapping boundaries were eliminated, and things made more consistent.

As one example, the metrics for the lower-case "e", and accented characters based on it, were adjusted a year ago to suppress stray marks, but at the expense of clipping the glyphs, making them less stylish. With the added insight now provided by datBounds visualization, it was possible to restore the style while keeping the stray marks away.

In summary -

  • 16 new glyphs were created.
  • 64 existing characters got tweaks. These were mostly affecting the .dat, but some glyphs were moved in the bitmap (a little or a lot) to solve various problems.

Download:
2024 June 20 Stone 24 Final Font Update.zip

As with the prior interim releases, this download is made up of -

  • .dat and 2 .dds files, for distribution/game inclusion
  • .ref [with changes annotated] and .xcf [master source GIMP project, including datBounds layers]

The hand-annotations in the .ref file are neither absolutely conclusive, nor necessary to preserve for future work. You may generate a fresh .ref without them from the .dat file by using refont.

Enumeration of Changes:

Spoiler

List of changes in the June 20 release, since the mid-May interim release, in codepoint order.


NEW means new glyph or accent. ALTERED means other glyph change. Otherwise, just some change in .dat metrics. The .ref file annotations provide further info. (Beyond that, Geep has granular details in a series of Word tables. It gets ugly.)

0x32 2
0x33 3
0x35 5
0x39 9
0x3c <
0x3d =
0x43 C
0x45 E
0x47 G
0x4e N
0x51 Q
0x5b [
0x5c \
0x5f _
0x62 b
0x63 c
0x64 d
0x65 e
0x68 h
0x6b k
0x6d m
0x6e n
0x6f o
0x70 p
0x71 q
0x72 r
0x73 s
0x74 t ALTERED
0x78 x
0x7c |
0x82 Ć
0x86 Ĉ
0x88 WAS: Ô NEW: G breve
0x8d Ș
0x8e Ț
0x90 đ NEW
0x92 ć
0x93 ż
0x95 ŝ
0x96 ĉ
0x98 WAS: ô NEW g breve
0x99 ŕ NEW
0x9a ǔ
0x9c ń
0x9d ș NEW
0x9e ț NEW
0xa2 Ű
0xa3 ě
0xa7 § NEW
0xa8 š
0xac Č
0xae č
0xb1 Ł
0xb4 Ž
0xb6 ť NEW
0xb7 ď NEW
0xb8 ž
0xbb ę
0xbc Œ NEW
0xbd œ NEW
0xbf ¿ NEW
0xc7 Ç
0xd0 Ð
0xd1 Ñ
0xd9 Ù
0xdb Û
0xdd Ý
0xde Þ NEW
0xe7 ç
0xe8 è
0xe9 é
0xea ê
0xeb ë
0xed í
0xee î
0xef ï
0xf0 ð NEW
0xf7 ř
0xfe þ NEW
0xff ÿ NEW

There will be some follow-on work, e.g., expressed earlier in this thread in a bugtracker entry.

Edited by Geep
  • Like 2
Link to comment
Share on other sites

Posted (edited)

@wesp5, please do add it to your patch in the meantime. Perhaps that would allow it to get a touch more of early testing. Anyone playing under a non-English European language may see fuller use of accents on the names of weapons and stock inventory items. But there could be existing or new problems there to be further addressed, as well as with any FM custom inventory item.

Also, any readable or sign that uses Stone font merits inspection. This is true for text presented in English, or (if the player has gone through the considerable hassle of tracking down and installing a pertinent language pack) in other Latin-based languages. Expected problems with latter, requiring freshening of translation to fix, are still-missing accents or (worst conceivable case) changed word-wrap leading to confusing formatting or truncated end of text.

Edited by Geep
expand info about readables
Link to comment
Share on other sites

I have just released a new patch version with your latest fonts included. According to the download numbers not many use my patch, but probably more than here, so it's better than nothing to get feedback! Thanks to your effort I have already restored Snatcher's whistle tool, because I was very annoyed by the pixel error that was visible whenever it was selected...

Edited by wesp5
  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

I've noticed that DAT definitions of the TDM mason and mason_glow fonts use a special type of per-character scaling. I've describe this now in the wiki: https://wiki.thedarkmod.com/index.php?title=Font_Metrics_%26_DAT_File_Format#Per-Character_Font_Scaling

In that article, I've also made some minor tweaks in the description of how xSkip is used in practice, based on more experience.

  • Like 1
Link to comment
Share on other sites

Refont v 2.1 is released, and available through Refont/Downloads.

The main changes are detailed in that wiki article's much-revised section on "Errors, Warnings, and Auto-Corrections".
Briefly, better code for these issues was first ported from datBounds, then reworked to boost coverage and consistency. Testing and incremental improvements of this was done by DAT --> REF conversions while I marched through TDM's 2.12 font corpus.

Highlights:

  • Checks that font metric values are in expected ranges.
  • Checks if character bounds (s, t, s2, t2) expressed as 0..1 floats, can be expressed exactly as 0..256 ints.
  • Distinguishes between minor & major warnings. Minor = very-slightly-off conversion from float to int.
  • Cross-checks of (s2-s) vs imageWidth, (t2-t) vs imageHeight vs height.
  • Always generate REF character bounds values (i.e., coord_s, etc.) as ints, never as decimals.
  • When generating a REF file, for certain problematic line items, optionally append a comment starting with "// WARNING:".
  • Adds options -scaling_ok, -no_warn_comments.
Link to comment
Share on other sites

  • 3 weeks later...
  • 2 weeks later...
  • 1 month later...

I just posted another release to https://bugs.thedarkmod.com/view.php?id=6543 for incremental improvements to 3 more fonts.

I've now touched all the ASCII-only fonts (the majority), and next will start looking at the remaining fonts for which the title of 6543 - "Change TDM codemap to add 2 new characters" - actually applies.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...