Demonstrate lambda/excel issue

Minimal repo to demonstrate covidatlas/li#564

Currently, it appears that AWS lambdas (or sdk, or ... ?) mess around with what should be binary data. When I download an Excel file via lambdas (using arc), and I try to parse the file, I get an error:

x Error: End of data reached (data length = 10043, asked index = 347979759). Corrupted zip ?

Others have noted the same issue.

To repro:

Clone this repo
npm install everything
Run node check-source.js -- this uses the xlsx npm package to parse the xlsx file in public
Start the sandbox with npm run start
Run ./do-crawl to run a crawler: it downloads the file from public using http/get-get-normal.
Run node check-crawled.js, and you'll see the error:

.../crawled.xlsx.gz
wrote /path/to/crawled.xlsx
/path/to/node_modules/XLSX/jszip.js:272
            throw new Error("End of data reached (data length = " + this.length + ", asked index = " + (newIndex) + "). Corrupted zip ?");
            ^

Error: End of data reached (data length = 10043, asked index = 347979759). Corrupted zip ?

The associated lambdas are ... hopefully ... clear!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
public		public
src		src
.arc		.arc
.gitignore		.gitignore
check-crawled.js		check-crawled.js
check-source.js		check-source.js
do-crawl		do-crawl
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demonstrate lambda/excel issue

About

Releases

Packages

Contributors 9

Languages

covidatlas/arc-excel-downloading-trouble

Folders and files

Latest commit

History

Repository files navigation

Demonstrate lambda/excel issue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages