Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full Unicode support #89

Merged
merged 2 commits into from
Apr 15, 2024
Merged

Add full Unicode support #89

merged 2 commits into from
Apr 15, 2024

Conversation

jtojnar
Copy link
Contributor

@jtojnar jtojnar commented Apr 14, 2024

dconf dump uses g_variant_print, which prints most Unicode characters verbatim. dconf2nix would read those and then use show to serialize the parsed strings. But show encodes Unicode characters as a decimal number preceded by a backslash, (e.g. \129315), which means nothing to Nix.

Let’s encode strings as UTF-8 when dumping them to Nix.

Also fix the test data from e2b5065, they were copied as reported by parserTraced but the actual data was mostly Unicode with few escape sequences.

@jtojnar jtojnar mentioned this pull request Apr 14, 2024
`dconf dump` uses `g_variant_print`, which prints most Unicode characters verbatim.
dconf2nix would read those and then use `show` to serialize the parsed strings.
But `show` encodes Unicode characters as a decimal number preceded by a backslash,
(e.g. `\129315`), which means nothing to Nix.

We have previously implemented special handling of strings consisting of just
a single emoji code point, to be able to import GNOME Characters history.
But an emoji glyph can consist of multiple code points, which was not handled.

Let’s revert the emoji hack and add systematic Unicode support in next commit.

Reverts 8a33e7c
Reverts 9b44d67
`dconf dump` uses `g_variant_print`, which prints most Unicode characters verbatim.
dconf2nix would read those and then use `show` to serialize the parsed strings.
But `show` encodes Unicode characters as a decimal number preceded by a backslash,
(e.g. `\129315`), which means nothing to Nix.

Let’s encode strings as UTF-8 when dumping them to Nix.

Also fix the test data from e2b5065,
they were copied as reported by `parserTraced` but the actual data
was mostly Unicode with few escape sequences.
@jtojnar jtojnar merged commit e8a5dd1 into nix-community:master Apr 15, 2024
1 check passed
@jtojnar jtojnar deleted the unicode branch April 15, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants