An open API service for producing an overview of a list of open source projects.

https://github.com/huggingface/tokenizers

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Score: 33.15778700855254

Last synced: about 7 hours ago
JSON representation

Repository metadata:

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production


Owner metadata:


Committers metadata

Last synced: 2 days ago

Total Commits: 1,875
Total Committers: 147
Avg Commits per committer: 12.755
Development Distribution Score (DDS): 0.502

Commits in past year: 130
Committers in past year: 51
Avg Commits per committer in past year: 2.549
Development Distribution Score (DDS) in past year: 0.685

Name Email Commits
Anthony MOI m****i@g****m 934
Nicolas Patry p****s@p****m 271
Pierric Cistac p****c@h****o 138
Arthur 4****r 117
dependabot[bot] 4****] 51
epwalsh e****0@g****m 48
Morgan Funtowicz m****n@h****o 35
Sebastian Pütz s****z@u****e 28
Mishig Davaadorj d****g@g****m 22
Bjarte Johansen b****n@g****m 11
thomwolf t****f@g****m 8
Sylvain Gugger s****r@g****m 7
Luc Georges M****e 7
Chris Ha h****9@g****m 6
sftse c****@f****t 5
Nathan Goldbaum n****m@g****m 5
Roy Hvaara h****a@g****m 5
Lysandre l****t@r****r 4
Clement c****e@g****m 4
Connor Boyle c****o@g****m 4
Julien Chaumond c****d@g****m 4
tinyboxvk t****k 3
大橋 玲音 1****n 3
hf-security-analysis[bot] 2****] 3
François Garillot f****s@g****t 3
Lucain l****p@g****m 3
Michael Feil 6****l 3
dctelus 9****s 3
Thomas Wang 2****1 2
SeongBeomLEE 2****r@n****m 2
and 117 more...

Issue and Pull Request metadata

Last synced: 2 days ago

Total issues: 620
Total pull requests: 530
Average time to close issues: about 1 year
Average time to close pull requests: 2 months
Total issue authors: 526
Total pull request authors: 132
Average comments per issue: 3.75
Average comments per pull request: 1.96
Merged pull request: 276
Bot issues: 0
Bot pull requests: 42

Past year issues: 58
Past year pull requests: 112
Past year average time to close issues: 13 days
Past year average time to close pull requests: 7 days
Past year issue authors: 53
Past year pull request authors: 46
Past year average comments per issue: 1.79
Past year average comments per pull request: 1.71
Past year merged pull request: 40
Past year bot issues: 0
Past year bot pull requests: 8

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/huggingface/tokenizers

Top Issue Authors

  • david-waterworth (15)
  • n1t0 (10)
  • Narsil (5)
  • SaulLu (4)
  • davidgilbertson (4)
  • chris-ha458 (4)
  • pietrolesci (4)
  • EricLBuehler (3)
  • talolard (3)
  • 8ria (3)
  • shivanraptor (3)
  • xenova (3)
  • DamonsJ (3)
  • DOGEwbx (3)
  • jafioti (3)

Top Pull Request Authors

  • Narsil (110)
  • ArthurZucker (108)
  • dependabot[bot] (42)
  • chris-ha458 (10)
  • sftse (10)
  • ngoldbaum (6)
  • hvaara (6)
  • McPatate (5)
  • eaplatanios (5)
  • tinyboxvk (5)
  • Wauplin (4)
  • b00f (4)
  • MeetThePatel (4)
  • 414owen (4)
  • boyleconnor (4)

Top Issue Labels

  • Stale (316)
  • Feature Request (12)
  • bug (12)
  • enhancement (10)
  • planned (4)
  • good first issue (2)
  • bytefallback (1)
  • training (1)
  • documentation (1)
  • python (1)
  • good second issue (1)

Top Pull Request Labels

  • Stale (60)
  • dependencies (42)
  • javascript (40)
  • github_actions (2)
  • Feature Request (1)

Package metadata

pypi.org: tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://tokenizers.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release: 0.22.2 (published 4 months ago)
  • Last Synced: 2026-03-18T15:03:06.427Z (about 2 months ago)
  • Versions: 120
  • Dependent Packages: 380
  • Dependent Repositories: 14,571
  • Downloads: 81,944,841 Last month
  • Docker Downloads: 42,285,347
  • Rankings:
    • Downloads: 0.057%
    • Dependent repos count: 0.068%
    • Dependent packages count: 0.086%
    • Docker downloads count: 0.599%
    • Stargazers count: 0.619%
    • Average: 0.626%
    • Forks count: 2.329%
  • Maintainers (4)
crates.io: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://docs.rs/tokenizers/
  • Licenses: Apache-2.0
  • Latest release: 0.22.1 (published 8 months ago)
  • Last Synced: 2026-04-29T16:59:20.457Z (14 days ago)
  • Versions: 38
  • Dependent Packages: 60
  • Dependent Repositories: 281
  • Downloads: 5,528,791 Total
  • Docker Downloads: 23,287,869
  • Rankings:
    • Stargazers count: 1.247%
    • Forks count: 1.477%
    • Dependent repos count: 2.361%
    • Average: 2.519%
    • Dependent packages count: 2.982%
    • Docker downloads count: 3.074%
    • Downloads: 3.97%
  • Maintainers (4)
npmjs.org: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
  • Licenses: Apache-2.0
  • Latest release: 0.13.3 (published almost 3 years ago)
  • Last Synced: 2026-04-06T20:46:23.157Z (about 1 month ago)
  • Versions: 38
  • Dependent Packages: 6
  • Dependent Repositories: 23
  • Downloads: 15,285 Last month
  • Docker Downloads: 130
  • Rankings:
    • Stargazers count: 1.166%
    • Docker downloads count: 1.479%
    • Forks count: 1.502%
    • Dependent repos count: 2.673%
    • Average: 3.134%
    • Dependent packages count: 4.385%
    • Downloads: 7.598%
  • Maintainers (4)
proxy.golang.org: github.com/huggingface/tokenizers

  • Homepage:
  • Documentation: https://pkg.go.dev/github.com/huggingface/tokenizers#section-documentation
  • Licenses: apache-2.0
  • Latest release: v0.22.2 (published 5 months ago)
  • Last Synced: 2026-04-17T11:02:11.193Z (26 days ago)
  • Versions: 43
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Rankings:
    • Stargazers count: 0.809%
    • Forks count: 1.187%
    • Average: 3.794%
    • Dependent repos count: 4.802%
    • Dependent packages count: 8.376%
alpine-edge: py3-tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.21.2-r1 (published about 1 month ago)
  • Last Synced: 2026-03-30T15:47:10.014Z (about 1 month ago)
  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 2.289%
    • Forks count: 4.25%
    • Average: 5.295%
    • Dependent packages count: 14.641%
  • Maintainers (1)
conda-forge.org: tokenizers

  • Homepage: https://pypi.org/project/tokenizers/
  • Licenses: Apache-2.0
  • Latest release: 0.13.1 (published over 3 years ago)
  • Last Synced: 2026-04-01T16:16:59.926Z (about 1 month ago)
  • Versions: 16
  • Dependent Packages: 6
  • Dependent Repositories: 35
  • Downloads: 4,311,723 Total
  • Rankings:
    • Stargazers count: 4.188%
    • Dependent repos count: 6.114%
    • Average: 6.559%
    • Forks count: 6.898%
    • Dependent packages count: 9.034%
alpine-edge: py3-tokenizers-pyc

Precompiled Python bytecode for py3-tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.21.2-r1 (published about 1 month ago)
  • Last Synced: 2026-03-30T15:47:10.721Z (about 1 month ago)
  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Average: 6.693%
    • Dependent packages count: 13.386%
  • Maintainers (1)
spack.io: py-tokenizers

Fast and Customizable Tokenizers.

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: []
  • Latest release: 0.19.1 (published almost 2 years ago)
  • Last Synced: 2026-03-30T20:04:00.540Z (about 1 month ago)
  • Versions: 7
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Rankings:
    • Dependent repos count: 0.0%
    • Stargazers count: 1.653%
    • Forks count: 3.946%
    • Average: 8.417%
    • Dependent packages count: 28.067%
  • Maintainers (1)
anaconda.org: tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

  • Homepage: https://github.com/huggingface/tokenizers
  • Licenses: Apache-2.0
  • Latest release: 0.22.2 (published 5 months ago)
  • Last Synced: 2026-03-17T12:04:19.130Z (about 2 months ago)
  • Versions: 11
  • Dependent Packages: 3
  • Dependent Repositories: 35
  • Downloads: 41,398 Total
  • Rankings:
    • Stargazers count: 9.936%
    • Forks count: 14.321%
    • Average: 23.106%
    • Dependent repos count: 27.231%
    • Dependent packages count: 40.938%
npmjs.org: @dkhokhlov/tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
  • Licenses: Apache-2.0
  • Latest release: 0.22.1 (published 6 months ago)
  • Last Synced: 2026-05-06T16:19:43.979Z (7 days ago)
  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 41 Last month
  • Rankings:
    • Dependent repos count: 23.407%
    • Average: 28.584%
    • Dependent packages count: 33.761%
  • Maintainers (1)
pypi.org: divyanx-tokenizers

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://divyanx-tokenizers.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release: 0.20.0.dev0 (published over 1 year ago)
  • Last Synced: 2026-04-18T19:32:59.367Z (25 days ago)
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 7 Last month
  • Rankings:
    • Dependent packages count: 10.454%
    • Average: 34.65%
    • Dependent repos count: 58.847%
  • Maintainers (1)
pypi.org: tokenizers-gt

  • Homepage: https://github.com/huggingface/tokenizers
  • Documentation: https://tokenizers-gt.readthedocs.io/
  • Licenses: Apache Software License
  • Latest release: 0.15.2.post0 (published about 2 years ago)
  • Last Synced: 2026-04-21T03:04:31.163Z (23 days ago)
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 352 Last month
  • Rankings:
    • Dependent packages count: 10.044%
    • Average: 38.708%
    • Dependent repos count: 67.371%
  • Maintainers (1)
npmjs.org: tokenizers-node

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

  • Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
  • Licenses: Apache-2.0
  • Latest release: 0.14.2-dev0 (published about 2 years ago)
  • Last Synced: 2026-05-11T18:51:16.345Z (2 days ago)
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 20 Last month
  • Rankings:
    • Dependent repos count: 32.533%
    • Average: 39.609%
    • Dependent packages count: 46.685%
  • Maintainers (1)
nixpkgs-unstable: python314Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-24.11: python312Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-24.05: python311Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-23.11: python310Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-24.05: python312Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-23.05: python311Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-23.05: python310Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-unstable: python313Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-24.11: python311Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

nixpkgs-23.11: python311Packages.tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production


Dependencies

.github/workflows/docs-check.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact v2 composite
.github/workflows/node-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
  • actions/setup-python v1 composite
.github/workflows/node.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
.github/workflows/python-release-conda.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/python-release-extra.yml actions
  • actions/checkout v1 composite
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
.github/workflows/python-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • actions/setup-python v4 composite
.github/workflows/python.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/rust-release.yml actions
  • actions-rs/toolchain v1 composite
  • actions/cache v1 composite
  • actions/checkout v1 composite
.github/workflows/rust.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v1 composite
bindings/python/Cargo.toml cargo
  • pyo3 0.17.2 development
  • tempfile 3.1 development
  • env_logger 0.7.1
  • itertools 0.9
  • libc 0.2
  • ndarray 0.13
  • numpy 0.17.2
  • onig 6.0
  • pyo3 0.17.2
  • rayon 1.3
  • serde 1.0
  • serde_json 1.0
  • tokenizers *
tokenizers/Cargo.toml cargo
  • assert_approx_eq 1.1 development
  • criterion 0.4 development
  • tempfile 3.1 development
  • aho-corasick 0.7
  • cached-path 0.6
  • clap 4.0
  • derive_builder 0.12
  • dirs 3.0
  • esaxx-rs 0.1
  • fancy-regex 0.10
  • getrandom 0.2.6
  • indicatif 0.15
  • itertools 0.9
  • lazy_static 1.4
  • log 0.4
  • macro_rules_attribute 0.1.2
  • onig 6.0
  • paste 1.0.6
  • rand 0.8
  • rayon 1.3
  • rayon-cond 0.1
  • regex 1.3
  • regex-syntax 0.6
  • reqwest 0.11
  • serde 1.0
  • serde_json 1.0
  • spm_precompiled 0.1
  • thiserror 1.0.30
  • unicode-normalization-alignments 0.1
  • unicode-segmentation 1.6
  • unicode_categories 0.1
tokenizers/examples/unstable_wasm/Cargo.toml cargo
  • wasm-bindgen-test 0.3.13 development
  • console_error_panic_hook 0.1.6
  • wasm-bindgen 0.2.63
  • wee_alloc 0.4.5
bindings/node/package-lock.json npm
  • 627 dependencies
bindings/node/package.json npm
  • @types/jest ^26.0.24 development
  • @typescript-eslint/eslint-plugin ^3.10.1 development
  • @typescript-eslint/parser ^3.10.1 development
  • eslint ^7.32.0 development
  • eslint-config-prettier ^6.15.0 development
  • eslint-plugin-jest ^23.20.0 development
  • eslint-plugin-jsdoc ^30.7.13 development
  • eslint-plugin-prettier ^3.4.1 development
  • eslint-plugin-simple-import-sort ^5.0.3 development
  • jest ^26.6.3 development
  • neon-cli ^0.9.1 development
  • prettier ^2.5.1 development
  • shelljs ^0.8.3 development
  • ts-jest ^26.5.6 development
  • typescript ^3.9.10 development
  • @types/node ^13.13.52
  • node-pre-gyp ^0.14.0
tokenizers/examples/unstable_wasm/www/package-lock.json npm
  • 312 dependencies
tokenizers/examples/unstable_wasm/www/package.json npm
  • copy-webpack-plugin ^11.0.0 development
  • webpack ^5.75.0 development
  • webpack-cli ^5.0.1 development
  • webpack-dev-server ^4.10.0 development
  • unstable_wasm file:../pkg