The output char buffer is too small to contain the decoded characters, encoding codepage '65001' #62

dbampersand · 2024-07-26T02:32:55Z

When loading a file with an entity lump that has characters with a codepoint greater than 127, the file will fail to parse correctly, outputting: "The output char buffer is too small to contain the decoded characters, encoding codepage '65001'"

This is because entities aren't actually compiled to UTF-8 but instead to 8-bit ASCII, for example if you put Ě (0xC49A) in an entity key it will compile to just E, but if you put a character that exists in extended ASCII like ÿ (0xFF) it won't get stripped.

See these three TF2 jump maps for examples of this: https://filebin.net/rbh7fcpz4zmdiqo6

For instance, on jump_4starters_b1_fix.bsp it crashes on the ç character (E7): https://i.imgur.com/QT5Xr2R.png

tsa96 · 2024-07-26T08:27:22Z

Huh, good spot. If this is being used everywhere in BSP (besides just ent lump) we perhaps should just make this encoding a static property of BspFile and use it everywhere. Will have a poke around in engine at some point to see if there's explicit mention of this.

dbampersand mentioned this issue Jul 26, 2024

Entity parsing: UTF8 fix #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The output char buffer is too small to contain the decoded characters, encoding codepage '65001' #62

The output char buffer is too small to contain the decoded characters, encoding codepage '65001' #62

dbampersand commented Jul 26, 2024 •

edited

Loading

tsa96 commented Jul 26, 2024

The output char buffer is too small to contain the decoded characters, encoding codepage '65001' #62

The output char buffer is too small to contain the decoded characters, encoding codepage '65001' #62

Comments

dbampersand commented Jul 26, 2024 • edited Loading

tsa96 commented Jul 26, 2024

dbampersand commented Jul 26, 2024 •

edited

Loading