Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

Add option to interpret null byte as an empty cell #24

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tomplex
Copy link

@tomplex tomplex commented Apr 12, 2022

Addresses #17

That said, I can't quite figure out how to make the test "test_null_byte_as_empty_cell" not escape the \x00.

@tomplex tomplex requested a review from emk April 12, 2022 20:12
.output_with_stdin(
br#"c1,c2,c3
1,2,3
\x00,\x00,1
Copy link
Author

@tomplex tomplex Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is failing because the \x00 is being escaped to \\x00, and I'm not quite sure how to fix it. the docs suggest that using br# should work.

@tomplex tomplex changed the title initial commit: option to interpret null byte as an empty cell Add option to interpret null byte as an empty cell Apr 12, 2022
let s = format!("^{}$", null_re_str);
let re = Regex::new(&s).context("can't compile regular expression")?;
Some(re)
let mut pattern = if opt.null_byte_as_empty == true {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a better way to deal with this, but I'm not familiar enough to tell for sure.

Copy link
Contributor

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to double-check whether this is actually a good use case for scrubcsv and not for a separate Unix tool that just deletes NULL bytes. We've encountered similar issues before and the best solution has often been to strip NULL bytes before invoking scrubcsv.

If that doesn't work in this case, I would like to quickly double-check our options.

Comment on lines +107 to +108
#[structopt(long = "null-byte-as-empty")]
null_byte_as_empty: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before adding an option, did we try --null '^\x00$'? Rust's regex` library allows bytes to be escaped.

If we need to add another null option besides --null, we should aim for a consistent naming convention, both with -null and the other existing options. Most of our existing options are either "--verb" or "--noun", and --null-byte-as-empty . It's definitely good to start with --null- like you do here, because that will ensure the right sort order on the display.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants