An open API service for producing an overview of a list of open source projects.

https://github.com/huggingface/datasets

ai artificial-intelligence computer-vision dataset-hub datasets deep-learning huggingface llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Score: 35.42250597558463

Last synced: about 4 hours ago
JSON representation

Repository metadata:

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools


Owner metadata:


Committers metadata

Last synced: 1 day ago

Total Commits: 4,245
Total Committers: 657
Avg Commits per committer: 6.461
Development Distribution Score (DDS): 0.728

Commits in past year: 230
Committers in past year: 66
Avg Commits per committer in past year: 3.485
Development Distribution Score (DDS) in past year: 0.391

Name Email Commits
Quentin Lhoest 4****q 1156
Albert Villanova del Moral 8****a 701
Mario Šaško m****7@g****m 314
Patrick von Platen p****n@g****m 128
Thomas Wolf t****f 87
Steven Liu 5****u 62
Yacine Jernite y****e 48
abhishek thakur a****r 41
Sasha Luccioni l****s@m****c 40
lewtun l****l@g****m 38
Bhavitvya Malik b****k@g****m 34
Julien Chaumond j****n@h****o 33
Mariama Drame m****a@d****d 32
Suraj Patil s****5@g****m 30
Polina Kazakova p****a@h****o 29
mariamabarham 3****m 26
emibaylor 2****r 22
Steven s****u@g****m 21
Gunjan Chhablani c****n@g****m 20
Julien Plu p****n@g****m 20
Sylvain Lesage s****e@h****o 18
Charin c****b@g****m 17
Joe Davison j****n@g****m 15
Matt R****1 15
Simon Brandeis 3****s 15
Teven t****o@g****m 15
Victor SANH v****h@g****m 15
Cahya Wirawan c****n@g****m 14
Alvaro Bartolome a****t@y****m 13
Jonatas Grosman j****n@g****m 13
and 627 more...

Issue and Pull Request metadata

Last synced: 1 day ago

Total issues: 1,181
Total pull requests: 1,441
Average time to close issues: 3 months
Average time to close pull requests: 28 days
Total issue authors: 870
Total pull request authors: 266
Average comments per issue: 3.15
Average comments per pull request: 2.29
Merged pull request: 941
Bot issues: 0
Bot pull requests: 0

Past year issues: 129
Past year pull requests: 247
Past year average time to close issues: 11 days
Past year average time to close pull requests: 8 days
Past year issue authors: 120
Past year pull request authors: 82
Past year average comments per issue: 2.53
Past year average comments per pull request: 1.19
Past year merged pull request: 108
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/huggingface/datasets

Top Issue Authors

  • albertvillanova (90)
  • lhoestq (23)
  • severo (16)
  • alex-hh (12)
  • kopyl (9)
  • mariosasko (8)
  • jonathanasdf (7)
  • npuichigo (6)
  • andysingal (6)
  • yuvalkirstain (6)
  • sanchit-gandhi (6)
  • d710055071 (6)
  • ain-soph (5)
  • stas00 (5)
  • patrickvonplaten (4)

Top Pull Request Authors

  • lhoestq (453)
  • albertvillanova (271)
  • mariosasko (108)
  • ArjunJagdale (35)
  • alex-hh (20)
  • Wauplin (11)
  • severo (9)
  • maddiedawson (9)
  • cakiki (8)
  • Harry-Yang0518 (8)
  • cyyever (8)
  • lewtun (8)
  • klamike (8)
  • CloseChoice (7)
  • ringohoffman (7)

Top Issue Labels

  • enhancement (232)
  • bug (104)
  • dataset request (14)
  • maintenance (13)
  • good first issue (10)
  • documentation (9)
  • good second issue (8)
  • duplicate (8)
  • generic discussion (8)
  • streaming (6)
  • dataset bug (5)
  • dataset-viewer (5)
  • question (3)
  • vision (2)
  • speech (2)
  • dataset contribution (1)
  • arrow (1)
  • metric bug (1)
  • help wanted (1)

Top Pull Request Labels

  • dataset contribution (16)
  • maintenance (3)
  • transfer-to-evaluate (1)
  • Dataset discussion (1)

Package metadata

pypi.org: datasets

HuggingFace community-driven open-source library of datasets

  • Homepage: https://github.com/huggingface/datasets
  • Documentation: https://datasets.readthedocs.io/
  • Licenses: Apache 2.0
  • Latest release: 4.8.5 (published 21 days ago)
  • Last Synced: 2026-05-16T00:30:56.269Z (3 days ago)
  • Versions: 117
  • Dependent Packages: 931
  • Dependent Repositories: 14,962
  • Downloads: 123,353,680 Last month
  • Docker Downloads: 39,467,733
  • Rankings:
    • Dependent packages count: 0.031%
    • Dependent repos count: 0.069%
    • Downloads: 0.112%
    • Stargazers count: 0.117%
    • Average: 0.22%
    • Forks count: 0.312%
    • Docker downloads count: 0.68%
  • Maintainers (4)
conda-forge.org: datasets

Datasets is a lightweight library providing one-line dataloaders for many public datasets and one liners to download and pre-process any of the number of datasets major public datasets provided on the HuggingFace Datasets Hub. Datasets are ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX). Datasets also provide an API for simple, fast, and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text.

  • Homepage: https://github.com/huggingface/datasets
  • Licenses: Apache-2.0
  • Latest release: 2.7.0 (published over 3 years ago)
  • Last Synced: 2026-03-20T21:34:41.294Z (about 2 months ago)
  • Versions: 34
  • Dependent Packages: 13
  • Dependent Repositories: 29
  • Rankings:
    • Stargazers count: 2.091%
    • Forks count: 2.69%
    • Average: 4.113%
    • Dependent packages count: 4.816%
    • Dependent repos count: 6.857%
spack.io: py-datasets

Datasets is a lightweight library providing two main features: one-line dataloaders for many public datasets and efficient data pre-processing.

  • Homepage: https://github.com/huggingface/datasets
  • Licenses: []
  • Latest release: 3.2.0 (published over 1 year ago)
  • Last Synced: 2026-05-14T18:24:00.621Z (4 days ago)
  • Versions: 4
  • Dependent Packages: 3
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 0.556%
    • Forks count: 1.862%
    • Average: 7.621%
    • Dependent packages count: 28.067%
  • Maintainers (2)
pypi.org: fdatasets

HuggingFace/Datasets is an open library of NLP datasets.

  • Homepage: https://github.com/huggingface/datasets
  • Documentation: https://fdatasets.readthedocs.io/
  • Licenses: Apache 2.0
  • Latest release: 1.12.1 (published about 4 years ago)
  • Last Synced: 2026-05-14T18:23:59.965Z (4 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 0
  • Rankings:
    • Stargazers count: 0.148%
    • Forks count: 0.378%
    • Dependent packages count: 4.842%
    • Dependent repos count: 6.332%
    • Average: 12.62%
    • Downloads: 51.399%
anaconda.org: datasets

Datasets is a lightweight library providing two main features: - one-line dataloaders for many public datasets: one-liners to download and pre-process any of the number of datasets major public datasets (text datasets in 467 languages and dialects, image datasets, audio datasets, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX), - efficient data pre-processing: simple, fast and reproducible data pre-processing for the above public datasets as well as your own local datasets in CSV/JSON/text/PNG/JPEG/etc. With simple commands like `processed_dataset = dataset.map(process_example)`, efficiently prepare the dataset for inspection and ML model evaluation and training.

  • Homepage: https://github.com/huggingface/datasets
  • Licenses: Apache-2.0
  • Latest release: 4.8.2 (published about 2 months ago)
  • Last Synced: 2026-03-20T13:04:26.374Z (about 2 months ago)
  • Versions: 8
  • Dependent Packages: 4
  • Dependent Repositories: 29
  • Downloads: 16,071 Total
  • Rankings:
    • Stargazers count: 6.039%
    • Forks count: 7.337%
    • Dependent packages count: 11.114%
    • Average: 13.422%
    • Dependent repos count: 29.197%
nixpkgs-unstable: python313Packages.datasets_3

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-unstable: python314Packages.datasets_3

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-23.11: python311Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-unstable: python314Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-24.11: python311Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-unstable: python313Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-24.11: python312Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-24.05: python312Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-23.05: python310Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-23.05: python311Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-24.05: python311Packages.datasets

Open-access datasets and evaluation metrics for natural language processing

nixpkgs-23.11: python310Packages.datasets

Open-access datasets and evaluation metrics for natural language processing


Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/release-conda.yml actions
  • actions/checkout v1 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/self-assign.yaml actions
.github/workflows/trufflehog.yml actions
  • actions/checkout v4 composite
  • trufflesecurity/trufflehog main composite
.github/workflows/build_documentation.yml actions
.github/workflows/build_pr_documentation.yml actions
.github/workflows/upload_pr_documentation.yml actions
setup.py pypi
pyproject.toml pypi