Code page 437 does not match the wikipedia article. #4

EnderShadow · 2023-07-16T21:48:11Z

It looks like there are a number of of symbols especially near the beginning which do not match the wikipedia article for the code page.

https://github.com/bonega/yore/blob/master/codegen/tables/unicode.org/CP437.txt
https://en.wikipedia.org/wiki/Code_page_437

bonega · 2023-07-18T07:37:32Z

Hi,
Thanks for noticing this.
The specification I am using is from the Unicode Consortium.
The specific file seems to be provided by Microsoft.
IDK which is the correct one.
Did you notice the difference because you were seeing unexpected results when decoding/encoding?

EnderShadow · 2023-07-19T16:01:44Z

I did not seem to find anything about code page 437 on the unicode consortium website. But yes, the project I'm using it for uses the characters which the wikipedia page has for indexes 1-31. I didn't take a close look to see if there were any other major differences, but the wikipedia one seems to be more accurate?

bonega · 2023-07-20T10:03:58Z

This is were I found it the original file: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
Checking in the python source, they are using the same specification: https://github.com/python/cpython/blob/main/Lib/encodings/cp437.py

You are saying that the wikipedia seems more accurate, can you give sources for it?
I have no idea 🙂

Edit: I found a pdf that supports the wikipedia definition and it is from IBM, so you are probably right.
Still I am a bit reluctant to make this change because it seems that many of the other libraries are using the microsoft definition.

I will leave this open for a while to get comments about it.

EnderShadow · 2023-07-20T14:52:29Z

It's not too big of an issue even if this isn't modified since it's easy enough to perform a filtered-map after converting to unicode.

It looks like what I'm trying to support has the same modification for all encodings for 1-31 and 128, so it's probably safer to assume that IBM is the non-standard definition here.

I'm working on a circuit simulator for circuits made with the game Turing Complete and it appears to use the IBM encodings which are identical to the Microsoft ones sans characters 1-31 and 127

bonega · 2023-07-26T15:41:24Z

The game looks very cool, I have to test it 👍
Surprised though that it would be using 437

EnderShadow · 2023-07-26T15:58:57Z

One of the components in the game is a text display and it has 3 different encodings you can choose for it. 437 and 910 are 2 of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code page 437 does not match the wikipedia article. #4

Code page 437 does not match the wikipedia article. #4

EnderShadow commented Jul 16, 2023

bonega commented Jul 18, 2023

EnderShadow commented Jul 19, 2023

bonega commented Jul 20, 2023 •

edited

Loading

EnderShadow commented Jul 20, 2023 •

edited

Loading

bonega commented Jul 26, 2023

EnderShadow commented Jul 26, 2023

Code page 437 does not match the wikipedia article. #4

Code page 437 does not match the wikipedia article. #4

Comments

EnderShadow commented Jul 16, 2023

bonega commented Jul 18, 2023

EnderShadow commented Jul 19, 2023

bonega commented Jul 20, 2023 • edited Loading

EnderShadow commented Jul 20, 2023 • edited Loading

bonega commented Jul 26, 2023

EnderShadow commented Jul 26, 2023

bonega commented Jul 20, 2023 •

edited

Loading

EnderShadow commented Jul 20, 2023 •

edited

Loading