Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

Add option to interpret null byte as an empty cell #24

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,13 @@ struct Opt {
/// quoting.
#[structopt(value_name = "CHAR", long = "quote", default_value = "\"")]
quote: CharSpecifier,

/// Should a null-byte (\x0) be interpreted as an empty cell
/// this option exists because it is not always possible to pass
/// a null byte into --null on the command line.
/// https://github.com/faradayio/scrubcsv/issues/17
#[structopt(long = "null-byte-as-empty")]
null_byte_as_empty: bool,
Comment on lines +107 to +108
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before adding an option, did we try --null '^\x00$'? Rust's regex` library allows bytes to be escaped.

If we need to add another null option besides --null, we should aim for a consistent naming convention, both with -null and the other existing options. Most of our existing options are either "--verb" or "--noun", and --null-byte-as-empty . It's definitely good to start with --null- like you do here, because that will ensure the right sort order on the display.

}

lazy_static! {
Expand All @@ -122,16 +129,25 @@ fn run() -> Result<()> {
// Remember the time we started.
let start_time = now();

// Build a regex containing our `--null` value.
let null_re = if let Some(null_re_str) = opt.null.as_ref() {
// Always match the full CSV value.
let s = format!("^{}$", null_re_str);
let re = Regex::new(&s).context("can't compile regular expression")?;
Some(re)
let mut pattern = if opt.null_byte_as_empty == true {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a better way to deal with this, but I'm not familiar enough to tell for sure.

Some(String::from(r"\x00"))
} else {
None
};

if let Some(null_re_str) = opt.null.as_ref() {
pattern = match pattern {
Some(p) => Some(format!(r"{}|^{}$", p, null_re_str)),
None => Some(format!("^{}$", null_re_str)),
}
}

// Build a regex containing our `--null` value(s).
let null_re = match pattern {
Some(p) => Some(Regex::new(&p).context("can't build regular expression")?),
None => None,
};

// Fetch our input from either standard input or a file. The only tricky
// detail here is that we use a `Box<dyn Read>` to represent "some object
// implementing `Read`, stored on the heap." This allows us to do runtime
Expand Down
44 changes: 44 additions & 0 deletions tests/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -200,3 +200,47 @@ a,b,c
"#
);
}

#[test]
fn test_null_byte_as_empty_cell() {
let testdir = TestDir::new("scrubcsv", "null_byte_as_empty");
let output = testdir
.cmd()
.arg("--null-byte-as-empty")
.output_with_stdin(
br#"c1,c2,c3
1,2,3
\x00,\x00,1
Copy link
Author

@tomplex tomplex Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is failing because the \x00 is being escaped to \\x00, and I'm not quite sure how to fix it. the docs suggest that using br# should work.

"#,
)
.expect("error running scrubcsv test");
assert_eq!(
output.stdout_str(),
r#"c1,c2,c3
1,2,3
,,1
"#
);
}
#[test]
fn test_null_byte_as_empty_cell_with_null_re() {
let testdir = TestDir::new("scrubcsv", "null_byte_as_empty_with_null_re");
let output = testdir
.cmd()
.arg("--null-byte-as-empty")
.args(&["--null", "NULL"])
.output_with_stdin(
br#"c1,c2,c3
1,2,NULL
\x00,\x00,1
"#,
)
.expect("error running scrubcsv test");
assert_eq!(
output.stdout_str(),
r#"c1,c2,c3
1,2,
,,1
"#
);
}