https://github.com/huggingface/datasets
ai artificial-intelligence computer-vision dataset-hub datasets deep-learning huggingface llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow
Score: 35.42250597558463
Last synced: about 4 hours ago
JSON representation
Repository metadata:
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
- Host: GitHub
- URL: https://github.com/huggingface/datasets
- Owner: huggingface
- License: apache-2.0
- Created: 2020-03-26T09:23:22.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2026-05-14T15:55:07.000Z (4 days ago)
- Last Synced: 2026-05-14T17:45:45.857Z (4 days ago)
- Topics: ai, artificial-intelligence, computer-vision, dataset-hub, datasets, deep-learning, huggingface, llm, machine-learning, natural-language-processing, nlp, numpy, pandas, pytorch, speech, tensorflow
- Language: Python
- Homepage: https://huggingface.co/docs/datasets
- Size: 86.2 MB
- Stars: 21,513
- Watchers: 278
- Forks: 3,196
- Open Issues: 1,104
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Security: SECURITY.md
- Authors: AUTHORS
- Zenodo: .zenodo.json
Owner metadata:
- Name: Hugging Face
- Login: huggingface
- Email:
- Kind: organization
- Description: The AI community building the future.
- Website: https://huggingface.co/
- Location: NYC + Paris
- Twitter: huggingface
- Company:
- Icon url: https://avatars.githubusercontent.com/u/25720743?v=4
- Repositories: 370
- Last Synced at: 2025-11-13T02:39:46.174Z
- Profile URL: https://github.com/huggingface
Committers metadata
Last synced: 1 day ago
Total Commits: 4,245
Total Committers: 657
Avg Commits per committer: 6.461
Development Distribution Score (DDS): 0.728
Commits in past year: 230
Committers in past year: 66
Avg Commits per committer in past year: 3.485
Development Distribution Score (DDS) in past year: 0.391
| Name | Commits | |
|---|---|---|
| Quentin Lhoest | 4****q | 1156 |
| Albert Villanova del Moral | 8****a | 701 |
| Mario Šaško | m****7@g****m | 314 |
| Patrick von Platen | p****n@g****m | 128 |
| Thomas Wolf | t****f | 87 |
| Steven Liu | 5****u | 62 |
| Yacine Jernite | y****e | 48 |
| abhishek thakur | a****r | 41 |
| Sasha Luccioni | l****s@m****c | 40 |
| lewtun | l****l@g****m | 38 |
| Bhavitvya Malik | b****k@g****m | 34 |
| Julien Chaumond | j****n@h****o | 33 |
| Mariama Drame | m****a@d****d | 32 |
| Suraj Patil | s****5@g****m | 30 |
| Polina Kazakova | p****a@h****o | 29 |
| mariamabarham | 3****m | 26 |
| emibaylor | 2****r | 22 |
| Steven | s****u@g****m | 21 |
| Gunjan Chhablani | c****n@g****m | 20 |
| Julien Plu | p****n@g****m | 20 |
| Sylvain Lesage | s****e@h****o | 18 |
| Charin | c****b@g****m | 17 |
| Joe Davison | j****n@g****m | 15 |
| Matt | R****1 | 15 |
| Simon Brandeis | 3****s | 15 |
| Teven | t****o@g****m | 15 |
| Victor SANH | v****h@g****m | 15 |
| Cahya Wirawan | c****n@g****m | 14 |
| Alvaro Bartolome | a****t@y****m | 13 |
| Jonatas Grosman | j****n@g****m | 13 |
| and 627 more... | ||
Issue and Pull Request metadata
Last synced: 1 day ago
Total issues: 1,181
Total pull requests: 1,441
Average time to close issues: 3 months
Average time to close pull requests: 28 days
Total issue authors: 870
Total pull request authors: 266
Average comments per issue: 3.15
Average comments per pull request: 2.29
Merged pull request: 941
Bot issues: 0
Bot pull requests: 0
Past year issues: 129
Past year pull requests: 247
Past year average time to close issues: 11 days
Past year average time to close pull requests: 8 days
Past year issue authors: 120
Past year pull request authors: 82
Past year average comments per issue: 2.53
Past year average comments per pull request: 1.19
Past year merged pull request: 108
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- albertvillanova (90)
- lhoestq (23)
- severo (16)
- alex-hh (12)
- kopyl (9)
- mariosasko (8)
- jonathanasdf (7)
- npuichigo (6)
- andysingal (6)
- yuvalkirstain (6)
- sanchit-gandhi (6)
- d710055071 (6)
- ain-soph (5)
- stas00 (5)
- patrickvonplaten (4)
Top Pull Request Authors
- lhoestq (453)
- albertvillanova (271)
- mariosasko (108)
- ArjunJagdale (35)
- alex-hh (20)
- Wauplin (11)
- severo (9)
- maddiedawson (9)
- cakiki (8)
- Harry-Yang0518 (8)
- cyyever (8)
- lewtun (8)
- klamike (8)
- CloseChoice (7)
- ringohoffman (7)
Top Issue Labels
- enhancement (232)
- bug (104)
- dataset request (14)
- maintenance (13)
- good first issue (10)
- documentation (9)
- good second issue (8)
- duplicate (8)
- generic discussion (8)
- streaming (6)
- dataset bug (5)
- dataset-viewer (5)
- question (3)
- vision (2)
- speech (2)
- dataset contribution (1)
- arrow (1)
- metric bug (1)
- help wanted (1)
Top Pull Request Labels
- dataset contribution (16)
- maintenance (3)
- transfer-to-evaluate (1)
- Dataset discussion (1)
Package metadata
- Total packages: 17
-
Total downloads:
- pypi: 123,353,680
- conda: 16,071 total
- Total docker downloads: 39,467,733
- Total dependent packages: 951 (may contain duplicates)
- Total dependent repositories: 15,020 (may contain duplicates)
- Total versions: 178
- Total maintainers: 8
pypi.org: datasets
HuggingFace community-driven open-source library of datasets
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://datasets.readthedocs.io/
- Licenses: Apache 2.0
- Latest release: 4.8.5 (published 21 days ago)
- Last Synced: 2026-05-16T00:30:56.269Z (3 days ago)
- Versions: 117
- Dependent Packages: 931
- Dependent Repositories: 14,962
- Downloads: 123,353,680 Last month
- Docker Downloads: 39,467,733
-
Rankings:
- Dependent packages count: 0.031%
- Dependent repos count: 0.069%
- Downloads: 0.112%
- Stargazers count: 0.117%
- Average: 0.22%
- Forks count: 0.312%
- Docker downloads count: 0.68%
- Maintainers (4)
conda-forge.org: datasets
Datasets is a lightweight library providing one-line dataloaders for many public datasets and one liners to download and pre-process any of the number of datasets major public datasets provided on the HuggingFace Datasets Hub. Datasets are ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX). Datasets also provide an API for simple, fast, and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text.
- Homepage: https://github.com/huggingface/datasets
- Licenses: Apache-2.0
- Latest release: 2.7.0 (published over 3 years ago)
- Last Synced: 2026-03-20T21:34:41.294Z (about 2 months ago)
- Versions: 34
- Dependent Packages: 13
- Dependent Repositories: 29
-
Rankings:
- Stargazers count: 2.091%
- Forks count: 2.69%
- Average: 4.113%
- Dependent packages count: 4.816%
- Dependent repos count: 6.857%
spack.io: py-datasets
Datasets is a lightweight library providing two main features: one-line dataloaders for many public datasets and efficient data pre-processing.
- Homepage: https://github.com/huggingface/datasets
- Licenses: []
- Latest release: 3.2.0 (published over 1 year ago)
- Last Synced: 2026-05-14T18:24:00.621Z (4 days ago)
- Versions: 4
- Dependent Packages: 3
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 0.556%
- Forks count: 1.862%
- Average: 7.621%
- Dependent packages count: 28.067%
- Maintainers (2)
pypi.org: fdatasets
HuggingFace/Datasets is an open library of NLP datasets.
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://fdatasets.readthedocs.io/
- Licenses: Apache 2.0
- Latest release: 1.12.1 (published about 4 years ago)
- Last Synced: 2026-05-14T18:23:59.965Z (4 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 0
-
Rankings:
- Stargazers count: 0.148%
- Forks count: 0.378%
- Dependent packages count: 4.842%
- Dependent repos count: 6.332%
- Average: 12.62%
- Downloads: 51.399%
anaconda.org: datasets
Datasets is a lightweight library providing two main features: - one-line dataloaders for many public datasets: one-liners to download and pre-process any of the number of datasets major public datasets (text datasets in 467 languages and dialects, image datasets, audio datasets, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX), - efficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text/PNG/JPEG/etc. With simple commands like `processed_dataset = dataset.map(process_example)`, efficiently prepare the dataset for inspection and ML model evaluation and training.
- Homepage: https://github.com/huggingface/datasets
- Licenses: Apache-2.0
- Latest release: 4.8.2 (published about 2 months ago)
- Last Synced: 2026-03-20T13:04:26.374Z (about 2 months ago)
- Versions: 8
- Dependent Packages: 4
- Dependent Repositories: 29
- Downloads: 16,071 Total
-
Rankings:
- Stargazers count: 6.039%
- Forks count: 7.337%
- Dependent packages count: 11.114%
- Average: 13.422%
- Dependent repos count: 29.197%
nixpkgs-unstable: python313Packages.datasets_3
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/datasets/3.nix#L75
- Licenses: Apache-2.0
- Latest release: 3.6.0 (published 23 days ago)
- Last Synced: 2026-04-25T09:04:33.717Z (23 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (2)
nixpkgs-unstable: python314Packages.datasets_3
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/datasets/3.nix#L75
- Licenses: Apache-2.0
- Latest release: 3.6.0 (published 23 days ago)
- Last Synced: 2026-04-25T09:04:42.604Z (23 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (2)
nixpkgs-23.11: python311Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/datasets/default.nix#L65
- Licenses: Apache-2.0
- Latest release: 2.14.5 (published 4 months ago)
- Last Synced: 2026-04-12T04:02:00.506Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-unstable: python314Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/datasets/default.nix#L75
- Licenses: Apache-2.0
- Latest release: 4.5.0 (published 3 months ago)
- Last Synced: 2026-04-10T19:01:53.789Z (about 1 month ago)
- Versions: 2
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-24.11: python311Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/datasets/default.nix#L67
- Licenses: Apache-2.0
- Latest release: 2.21.0 (published 4 months ago)
- Last Synced: 2026-03-06T01:27:19.899Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-unstable: python313Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/datasets/default.nix#L75
- Licenses: Apache-2.0
- Latest release: 4.5.0 (published 3 months ago)
- Last Synced: 2026-04-07T18:10:51.355Z (about 1 month ago)
- Versions: 2
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-24.11: python312Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/datasets/default.nix#L67
- Licenses: Apache-2.0
- Latest release: 2.21.0 (published 4 months ago)
- Last Synced: 2026-03-07T15:18:57.069Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-24.05: python312Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/datasets/default.nix#L68
- Licenses: Apache-2.0
- Latest release: 2.19.0 (published 4 months ago)
- Last Synced: 2026-03-07T11:03:26.610Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
nixpkgs-23.05: python310Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.05/pkgs/development/python-modules/datasets/default.nix#L70
- Licenses: Apache-2.0
- Latest release: 2.12.0 (published 4 months ago)
- Last Synced: 2026-04-14T12:05:54.606Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-23.05: python311Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.05/pkgs/development/python-modules/datasets/default.nix#L70
- Licenses: Apache-2.0
- Latest release: 2.12.0 (published 4 months ago)
- Last Synced: 2026-04-10T04:01:45.043Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-24.05: python311Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/datasets/default.nix#L68
- Licenses: Apache-2.0
- Latest release: 2.19.0 (published 4 months ago)
- Last Synced: 2026-05-01T13:40:07.051Z (17 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-23.11: python310Packages.datasets
Open-access datasets and evaluation metrics for natural language processing
- Homepage: https://github.com/huggingface/datasets
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/datasets/default.nix#L65
- Licenses: Apache-2.0
- Latest release: 2.14.5 (published 4 months ago)
- Last Synced: 2026-03-07T03:34:29.546Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v1 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v4 composite
- trufflesecurity/trufflehog main composite