Set `%option 8bit` in `KOREScanner.l` #949

Scott-Guest · 2024-01-11T21:36:09Z

Fixes #948

Flex does not directly support Unicode, but it does support %option 8bit where every 8-bit byte in the input stream is treated as a separate character. To fix #948, rather than escaping in the frontend then, we can just accept any byte sequence inside comments (in practice always UTF-8), allowing the original source text to be passed through unmodified.

However, 8-bit mode also makes it so that . and negated character classes [^bar] accept non-ASCII bytes, which we sometime want to disallow, e.g., in string literals, so we need to update every such regex accordingly.

Baltoli · 2024-01-12T09:36:14Z

Confirmed that this fixes the issue; good catch @Scott-Guest!

Fixes #948 Flex does not directly support Unicode, but it does support `%option 8bit` where every 8-bit byte in the input stream is treated as a separate character. To fix #948, rather than escaping in the frontend then, we can just accept any byte sequence inside comments (in practice always UTF-8), allowing the original source text to be passed through unmodified. However, 8-bit mode also makes it so that `.` and negated character classes `[^bar]` accept non-ASCII bytes, which we sometime want to disallow, e.g., in string literals, so we need to update every such regex accordingly.

Set KOREScanner.l to 8bit mode

1e76083

Scott-Guest self-assigned this Jan 11, 2024

Scott-Guest requested a review from Baltoli January 11, 2024 23:05

Scott-Guest marked this pull request as ready for review January 11, 2024 23:05

Baltoli added the automerge label Jan 12, 2024

Baltoli approved these changes Jan 12, 2024

View reviewed changes

rv-jenkins merged commit d7fd6d2 into master Jan 12, 2024
7 checks passed

rv-jenkins deleted the unicode-comments branch January 12, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set `%option 8bit` in `KOREScanner.l` #949

Set `%option 8bit` in `KOREScanner.l` #949

Scott-Guest commented Jan 11, 2024 •

edited

Loading

Baltoli commented Jan 12, 2024

Set %option 8bit in KOREScanner.l #949

Set %option 8bit in KOREScanner.l #949

Conversation

Scott-Guest commented Jan 11, 2024 • edited Loading

Baltoli commented Jan 12, 2024

Set `%option 8bit` in `KOREScanner.l` #949

Set `%option 8bit` in `KOREScanner.l` #949

Scott-Guest commented Jan 11, 2024 •

edited

Loading