Skip to content

Commit

Permalink
Update repositories-recommendations.md (#1429)
Browse files Browse the repository at this point in the history
Remove discord as an option for large dataset requests.
  • Loading branch information
cakiki authored Sep 24, 2024
1 parent 596e9cf commit 2995cee
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/hub/repositories-recommendations.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Under the hood, the Hub uses Git to version the data, which has structural impli
If your repo is crossing some of the numbers mentioned in the previous section, **we strongly encourage you to check out [`git-sizer`](https://github.com/github/git-sizer)**,
which has very detailed documentation about the different factors that will impact your experience. Here is a TL;DR of factors to consider:

- **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. Please provide details of your project. You can contact us at datasets@huggingface.co or on [our Discord](http://hf.co/join/discord).
- **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. To do that, please send an email with details of your project to datasets@huggingface.co.
- **Number of files**:
- For optimal experience, we recommend keeping the total number of files under 100k. Try merging the data into fewer files if you have more.
For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.
Expand Down

0 comments on commit 2995cee

Please sign in to comment.