Update datasets-download-stats.md #1466

lhoestq · 2024-10-22T15:42:30Z

Updated the download count method, and I kept how it was working before september 2024 (since odler data an be viewer from Enterprise analytics) cc @julien-c

HuggingFaceDocBuilderDev · 2024-10-22T15:44:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/hub/datasets-download-stats.md

Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>

lhoestq · 2024-10-23T13:29:15Z

merging this one for now, we can still add more details later if needed

julien-c · 2024-10-24T11:16:12Z

docs/hub/datasets-download-stats.md


-* The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a [script](/docs/datasets/dataset_script) to load the data from an external source.
-* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.
+## Before Setpember 2024


julien-c · 2024-10-24T11:17:03Z

docs/hub/datasets-download-stats.md

@@ -2,7 +2,11 @@

 ## How are download stats generated for datasets?


would be clearer maybe to word it like: "How are downloads counted for datasets" (same for models)

julien-c · 2024-10-24T11:23:37Z

docs/hub/datasets-download-stats.md

@@ -2,7 +2,11 @@

 ## How are download stats generated for datasets?

-The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads. This means that:
+Counting the number of downloads for datasets is not a trivial task, as a single dataset repository might contain multiple files, from multiple subsets and splits (e.g. train/validation/test) and sometimes with many files in a single split. To solve this issue and avoid counting one person's download multiple times, we treat all files downloaded by a user within a 5-minute window as a single dataset download. This counting happens automatically on our servers when files are downloaded (through GET or HEAD requests), with no need to collect any user information or make additional calls.


by a user (based on their IP address)

maybe add this to make it clearer for us? @lhoestq

lhoestq · 2024-10-24T13:03:37Z

sounds good ! opened #1469

Update datasets-download-stats.md

04e84f5

lhoestq marked this pull request as ready for review October 22, 2024 15:43

lhoestq requested a review from davanstrien October 22, 2024 15:43

davanstrien approved these changes Oct 22, 2024

View reviewed changes

docs/hub/datasets-download-stats.md Outdated Show resolved Hide resolved

docs/hub/datasets-download-stats.md Outdated Show resolved Hide resolved

lhoestq and others added 2 commits October 22, 2024 17:53

Update docs/hub/datasets-download-stats.md

edf34f1

Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>

Update docs/hub/datasets-download-stats.md

572e647

Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>

lhoestq merged commit 11be0d6 into main Oct 23, 2024
2 checks passed

lhoestq deleted the datasets-downloads-update branch October 23, 2024 13:29

julien-c reviewed Oct 24, 2024

View reviewed changes

lhoestq mentioned this pull request Oct 24, 2024

Update datasets-download-stats.md #1469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update datasets-download-stats.md #1466

Update datasets-download-stats.md #1466

lhoestq commented Oct 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 22, 2024

lhoestq commented Oct 23, 2024

julien-c Oct 24, 2024

julien-c Oct 24, 2024

julien-c Oct 24, 2024

lhoestq commented Oct 24, 2024

		@@ -2,7 +2,11 @@

		## How are download stats generated for datasets?

Update datasets-download-stats.md #1466

Update datasets-download-stats.md #1466

Conversation

lhoestq commented Oct 22, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Oct 22, 2024

lhoestq commented Oct 23, 2024

julien-c Oct 24, 2024

Choose a reason for hiding this comment

julien-c Oct 24, 2024

Choose a reason for hiding this comment

julien-c Oct 24, 2024

Choose a reason for hiding this comment

lhoestq commented Oct 24, 2024

lhoestq commented Oct 22, 2024 •

edited

Loading