Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code page 437 does not match the wikipedia article. #4

Open
EnderShadow opened this issue Jul 16, 2023 · 6 comments
Open

Code page 437 does not match the wikipedia article. #4

EnderShadow opened this issue Jul 16, 2023 · 6 comments

Comments

@EnderShadow
Copy link
Contributor

It looks like there are a number of of symbols especially near the beginning which do not match the wikipedia article for the code page.

https://github.com/bonega/yore/blob/master/codegen/tables/unicode.org/CP437.txt
https://en.wikipedia.org/wiki/Code_page_437

@bonega
Copy link
Owner

bonega commented Jul 18, 2023

Hi,
Thanks for noticing this.
The specification I am using is from the Unicode Consortium.
The specific file seems to be provided by Microsoft.
IDK which is the correct one.
Did you notice the difference because you were seeing unexpected results when decoding/encoding?

@EnderShadow
Copy link
Contributor Author

I did not seem to find anything about code page 437 on the unicode consortium website. But yes, the project I'm using it for uses the characters which the wikipedia page has for indexes 1-31. I didn't take a close look to see if there were any other major differences, but the wikipedia one seems to be more accurate?

@bonega
Copy link
Owner

bonega commented Jul 20, 2023

This is were I found it the original file: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
Checking in the python source, they are using the same specification: https://github.com/python/cpython/blob/main/Lib/encodings/cp437.py

You are saying that the wikipedia seems more accurate, can you give sources for it?
I have no idea 🙂

Edit: I found a pdf that supports the wikipedia definition and it is from IBM, so you are probably right.
Still I am a bit reluctant to make this change because it seems that many of the other libraries are using the microsoft definition.

I will leave this open for a while to get comments about it.

@EnderShadow
Copy link
Contributor Author

EnderShadow commented Jul 20, 2023

It's not too big of an issue even if this isn't modified since it's easy enough to perform a filtered-map after converting to unicode.

It looks like what I'm trying to support has the same modification for all encodings for 1-31 and 128, so it's probably safer to assume that IBM is the non-standard definition here.

I'm working on a circuit simulator for circuits made with the game Turing Complete and it appears to use the IBM encodings which are identical to the Microsoft ones sans characters 1-31 and 127

@bonega
Copy link
Owner

bonega commented Jul 26, 2023

The game looks very cool, I have to test it 👍
Surprised though that it would be using 437

@EnderShadow
Copy link
Contributor Author

One of the components in the game is a text display and it has 3 different encodings you can choose for it. 437 and 910 are 2 of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants