Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File not found in write_ and read_ functions when filepath contains tilde #1085

Open
george-wood opened this issue May 7, 2024 · 9 comments · May be fixed by #1140
Open

File not found in write_ and read_ functions when filepath contains tilde #1085

george-wood opened this issue May 7, 2024 · 9 comments · May be fixed by #1140
Labels
bug Something isn't working

Comments

@george-wood
Copy link
Contributor

george-wood commented May 7, 2024

There seems to be an issue locating files when the filepath/source contains a tilde (e.g. "~/Desktop/some_file.csv"). This is a common pattern for macOS users. Any ideas what might be causing this?

library(polars)
pl$DataFrame(a = 1)$write_csv("~/Desktop/foo_r.csv")
#> Error: Execution halted with the following contexts
#>    0: In R: in $write_csv():
#>    0: During function call [base::tryCatch(base::withCallingHandlers({
#>           NULL
#>           base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-fun-f0d4b669a3b"), 
#>               base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, 
#>               quote = TRUE), file = "/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>               compress = FALSE)
#>           base::flush(base::stdout())
#>           base::flush(base::stderr())
#>           NULL
#>           base::invisible()
#>       }, error = function(e) {
#>           {
#>               callr_data <- base::as.environment("tools:callr")$`__callr_data__`
#>               err <- callr_data$err
#>               if (FALSE) {
#>                   base::assign(".Traceback", base::.traceback(4), envir = callr_data)
#>                   utils::dump.frames("__callr_dump__")
#>                   base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`, 
#>                       envir = callr_data)
#>                   base::rm("__callr_dump__", envir = .GlobalEnv)
#>               }
#>               e <- err$process_call(e)
#>               e2 <- err$new_error("error in callr subprocess")
#>               class <- base::class
#>               class(e2) <- base::c("callr_remote_error", class(e2))
#>               e2 <- err$add_trace_back(e2)
#>               cut <- base::which(e2$trace$scope == "global")[1]
#>               if (!base::is.na(cut)) {
#>                   e2$trace <- e2$trace[-(1:cut), ]
#>               }
#>               base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>                   ".error"))
#>           }
#>       }, interrupt = function(e) {
#>           {
#>               callr_data <- base::as.environment("tools:callr")$`__callr_data__`
#>               err <- callr_data$err
#>               if (FALSE) {
#>                   base::assign(".Traceback", base::.traceback(4), envir = callr_data)
#>                   utils::dump.frames("__callr_dump__")
#>                   base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`, 
#>                       envir = callr_data)
#>                   base::rm("__callr_dump__", envir = .GlobalEnv)
#>               }
#>               e <- err$process_call(e)
#>               e2 <- err$new_error("error in callr subprocess")
#>               class <- base::class
#>               class(e2) <- base::c("callr_remote_error", class(e2))
#>               e2 <- err$add_trace_back(e2)
#>               cut <- base::which(e2$trace$scope == "global")[1]
#>               if (!base::is.na(cut)) {
#>                   e2$trace <- e2$trace[-(1:cut), ]
#>               }
#>               base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>                   ".error"))
#>           }
#>       }, callr_message = function(e) {
#>           base::try(base::signalCondition(e))
#>       }), error = function(e) {
#>           NULL
#>           if (FALSE) {
#>               base::try(base::stop(e))
#>           }
#>           else {
#>               base::invisible()
#>           }
#>       }, interrupt = function(e) {
#>           NULL
#>           if (FALSE) {
#>               e
#>           }
#>           else {
#>               base::invisible()
#>           }
#>       })]
#>    1: Os { code: 2, kind: NotFound, message: "No such file or directory" }

Created on 2024-05-07 with reprex v2.1.0

I checked the Python implementation and it works as expected:

import polars as pl
pl.DataFrame({"a": 1}).write_csv("~/Desktop/foo_python.csv")
#> None
pl.read_csv("~/Desktop/foo_python.csv")
#> shape: (1, 1)
#> ┌─────┐
#> │ a   │
#> │ --- │
#> │ i64 │
#> ╞═════╡
#> │ 1   │
#> └─────┘

Created at 2024-05-07 15:27:21 BST by reprexlite v0.5.0

@george-wood
Copy link
Contributor Author

Just to note: Using a tilde worked previously; I noticed this error only recently.

@george-wood george-wood changed the title Filepaths that contain tilde are not locating file or directory File not found in write_ and read_ functions when filepath contains tilde May 7, 2024
@etiennebacher etiennebacher added the bug Something isn't working label May 7, 2024
@etiennebacher
Copy link
Collaborator

Thanks for the report, do you want to make a PR for this? I think this should be solved using path.expand() but we need to ensure it works correctly when we pass a URL

@george-wood
Copy link
Contributor Author

george-wood commented May 8, 2024

@etiennebacher Will do. However, before doing so, do you think the cause is on the rust side? Perhaps with the update to 0.39.0? I cannot work out why tilde is only now causing problems.

I agree this can be solved using path.expand(), but I can't see why expansion should be necessary for write_csv() where there is no URL checking:

r-polars/R/dataframe__frame.R

Lines 1949 to 1973 in 6fdd4a9

DataFrame_write_csv = function(
file,
...,
include_bom = FALSE,
include_header = TRUE,
separator = ",",
line_terminator = "\n",
quote = '"',
batch_size = 1024,
datetime_format = NULL,
date_format = NULL,
time_format = NULL,
float_precision = NULL,
null_values = "",
quote_style = "necessary") {
.pr$DataFrame$write_csv(
self,
file, include_bom, include_header, separator, line_terminator, quote,
batch_size, datetime_format, date_format, time_format, float_precision,
null_values, quote_style
) |>
unwrap("in $write_csv():")
invisible(self)
}

If we end up using path.expand(), it will need to be implemented across all io functions.

@etiennebacher
Copy link
Collaborator

I agree this can be solved using path.expand(), but I can't see why expansion should be necessary for write_csv() where there is no URL checking:

My bad, I thought this error was for read_csv() only.

I can't see any recent changes in the R or Rust methods that would explain this failure so it's probably from rust-polars:

https://github.com/pola-rs/r-polars/blame/1de4d0dfdf83228f2d4933fc7d8239a565d907fa/R/dataframe__frame.R#L1949
https://github.com/pola-rs/r-polars/blame/1de4d0dfdf83228f2d4933fc7d8239a565d907fa/src/rust/src/rdataframe/mod.rs#L487

Do you remember what was the last package version for which it worked?

@etiennebacher
Copy link
Collaborator

They already normalize the file path in python but I can't find a recent commit that changed that, so I suppose passing a path starting with ~ to rust-polars was not really supported before:

https://github.com/pola-rs/polars/blob/d3411566a29e414cd77bfcdad365362c944d48df/py-polars/polars/dataframe/frame.py#L2299-L2305

@eitsupi
Copy link
Collaborator

eitsupi commented May 10, 2024

Might be worth checking node.js and polars-cli

@george-wood
Copy link
Contributor Author

george-wood commented May 10, 2024

Do you remember what was the last package version for which it worked?

I believe it worked before the update to rust-polars 0.39.

They already normalize the file path in python

I suppose we should implement the equivalent normalization if ~ is not supported on the rust-polars side. What do you think @etiennebacher and @eitsupi?

@eitsupi
Copy link
Collaborator

eitsupi commented May 10, 2024

I don't know if this is the change was intended upstream or not. I think it could be a bug.

@etiennebacher
Copy link
Collaborator

I suppose we should implement the equivalent normalization if ~ is not supported on the rust-polars side

I think so, especially since they handle the ~ in python

I don't know if this is the change was intended upstream or not. I think it could be a bug.

It could be but the file where write_csv() is defined in python is too big to use git blame in the github interface so I can't check if it changed recently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants