-
Notifications
You must be signed in to change notification settings - Fork 7
Add option to interpret null byte as an empty cell #24
base: master
Are you sure you want to change the base?
Conversation
.output_with_stdin( | ||
br#"c1,c2,c3 | ||
1,2,3 | ||
\x00,\x00,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is failing because the \x00
is being escaped to \\x00
, and I'm not quite sure how to fix it. the docs suggest that using br#
should work.
let s = format!("^{}$", null_re_str); | ||
let re = Regex::new(&s).context("can't compile regular expression")?; | ||
Some(re) | ||
let mut pattern = if opt.null_byte_as_empty == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a better way to deal with this, but I'm not familiar enough to tell for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to double-check whether this is actually a good use case for scrubcsv
and not for a separate Unix tool that just deletes NULL bytes. We've encountered similar issues before and the best solution has often been to strip NULL bytes before invoking scrubcsv
.
If that doesn't work in this case, I would like to quickly double-check our options.
#[structopt(long = "null-byte-as-empty")] | ||
null_byte_as_empty: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before adding an option, did we try --null '^\x00$'? Rust's
regex` library allows bytes to be escaped.
If we need to add another null option besides --null
, we should aim for a consistent naming convention, both with -null
and the other existing options. Most of our existing options are either "--verb" or "--noun", and --null-byte-as-empty
. It's definitely good to start with --null-
like you do here, because that will ensure the right sort order on the display.
Addresses #17
That said, I can't quite figure out how to make the test "test_null_byte_as_empty_cell" not escape the
\x00
.