https://github.com/huggingface/tokenizers
bert gpt language-model natural-language-processing natural-language-understanding nlp transformers
Score: 33.15778700855254
Last synced: about 7 hours ago
JSON representation
Repository metadata:
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
- Host: GitHub
- URL: https://github.com/huggingface/tokenizers
- Owner: huggingface
- License: apache-2.0
- Created: 2019-11-01T17:52:20.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2026-05-06T03:21:52.000Z (8 days ago)
- Last Synced: 2026-05-06T05:18:23.944Z (8 days ago)
- Topics: bert, gpt, language-model, natural-language-processing, natural-language-understanding, nlp, transformers
- Language: Rust
- Homepage: https://huggingface.co/docs/tokenizers
- Size: 16.8 MB
- Stars: 10,694
- Watchers: 120
- Forks: 1,084
- Open Issues: 166
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Owner metadata:
- Name: Hugging Face
- Login: huggingface
- Email:
- Kind: organization
- Description: The AI community building the future.
- Website: https://huggingface.co/
- Location: NYC + Paris
- Twitter: huggingface
- Company:
- Icon url: https://avatars.githubusercontent.com/u/25720743?v=4
- Repositories: 370
- Last Synced at: 2025-11-13T02:39:46.174Z
- Profile URL: https://github.com/huggingface
Committers metadata
Last synced: 2 days ago
Total Commits: 1,875
Total Committers: 147
Avg Commits per committer: 12.755
Development Distribution Score (DDS): 0.502
Commits in past year: 130
Committers in past year: 51
Avg Commits per committer in past year: 2.549
Development Distribution Score (DDS) in past year: 0.685
| Name | Commits | |
|---|---|---|
| Anthony MOI | m****i@g****m | 934 |
| Nicolas Patry | p****s@p****m | 271 |
| Pierric Cistac | p****c@h****o | 138 |
| Arthur | 4****r | 117 |
| dependabot[bot] | 4****] | 51 |
| epwalsh | e****0@g****m | 48 |
| Morgan Funtowicz | m****n@h****o | 35 |
| Sebastian Pütz | s****z@u****e | 28 |
| Mishig Davaadorj | d****g@g****m | 22 |
| Bjarte Johansen | b****n@g****m | 11 |
| thomwolf | t****f@g****m | 8 |
| Sylvain Gugger | s****r@g****m | 7 |
| Luc Georges | M****e | 7 |
| Chris Ha | h****9@g****m | 6 |
| sftse | c****@f****t | 5 |
| Nathan Goldbaum | n****m@g****m | 5 |
| Roy Hvaara | h****a@g****m | 5 |
| Lysandre | l****t@r****r | 4 |
| Clement | c****e@g****m | 4 |
| Connor Boyle | c****o@g****m | 4 |
| Julien Chaumond | c****d@g****m | 4 |
| tinyboxvk | t****k | 3 |
| 大橋 玲音 | 1****n | 3 |
| hf-security-analysis[bot] | 2****] | 3 |
| François Garillot | f****s@g****t | 3 |
| Lucain | l****p@g****m | 3 |
| Michael Feil | 6****l | 3 |
| dctelus | 9****s | 3 |
| Thomas Wang | 2****1 | 2 |
| SeongBeomLEE | 2****r@n****m | 2 |
| and 117 more... | ||
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 620
Total pull requests: 530
Average time to close issues: about 1 year
Average time to close pull requests: 2 months
Total issue authors: 526
Total pull request authors: 132
Average comments per issue: 3.75
Average comments per pull request: 1.96
Merged pull request: 276
Bot issues: 0
Bot pull requests: 42
Past year issues: 58
Past year pull requests: 112
Past year average time to close issues: 13 days
Past year average time to close pull requests: 7 days
Past year issue authors: 53
Past year pull request authors: 46
Past year average comments per issue: 1.79
Past year average comments per pull request: 1.71
Past year merged pull request: 40
Past year bot issues: 0
Past year bot pull requests: 8
Top Issue Authors
- david-waterworth (15)
- n1t0 (10)
- Narsil (5)
- SaulLu (4)
- davidgilbertson (4)
- chris-ha458 (4)
- pietrolesci (4)
- EricLBuehler (3)
- talolard (3)
- 8ria (3)
- shivanraptor (3)
- xenova (3)
- DamonsJ (3)
- DOGEwbx (3)
- jafioti (3)
Top Pull Request Authors
- Narsil (110)
- ArthurZucker (108)
- dependabot[bot] (42)
- chris-ha458 (10)
- sftse (10)
- ngoldbaum (6)
- hvaara (6)
- McPatate (5)
- eaplatanios (5)
- tinyboxvk (5)
- Wauplin (4)
- b00f (4)
- MeetThePatel (4)
- 414owen (4)
- boyleconnor (4)
Top Issue Labels
- Stale (316)
- Feature Request (12)
- bug (12)
- enhancement (10)
- planned (4)
- good first issue (2)
- bytefallback (1)
- training (1)
- documentation (1)
- python (1)
- good second issue (1)
Top Pull Request Labels
- Stale (60)
- dependencies (42)
- javascript (40)
- github_actions (2)
- Feature Request (1)
Package metadata
- Total packages: 23
-
Total downloads:
- conda: 4,353,121 total
- pypi: 81,945,200 last-month
- npm: 15,346 last-month
- cargo: 5,528,791 total
- Total docker downloads: 65,573,346
- Total dependent packages: 456 (may contain duplicates)
- Total dependent repositories: 14,946 (may contain duplicates)
- Total versions: 325
- Total maintainers: 19
pypi.org: tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://tokenizers.readthedocs.io/
- Licenses: Apache Software License
- Latest release: 0.22.2 (published 4 months ago)
- Last Synced: 2026-03-18T15:03:06.427Z (about 2 months ago)
- Versions: 120
- Dependent Packages: 380
- Dependent Repositories: 14,571
- Downloads: 81,944,841 Last month
- Docker Downloads: 42,285,347
-
Rankings:
- Downloads: 0.057%
- Dependent repos count: 0.068%
- Dependent packages count: 0.086%
- Docker downloads count: 0.599%
- Stargazers count: 0.619%
- Average: 0.626%
- Forks count: 2.329%
- Maintainers (4)
crates.io: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://docs.rs/tokenizers/
- Licenses: Apache-2.0
- Latest release: 0.22.1 (published 8 months ago)
- Last Synced: 2026-04-29T16:59:20.457Z (14 days ago)
- Versions: 38
- Dependent Packages: 60
- Dependent Repositories: 281
- Downloads: 5,528,791 Total
- Docker Downloads: 23,287,869
-
Rankings:
- Stargazers count: 1.247%
- Forks count: 1.477%
- Dependent repos count: 2.361%
- Average: 2.519%
- Dependent packages count: 2.982%
- Docker downloads count: 3.074%
- Downloads: 3.97%
- Maintainers (4)
npmjs.org: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
- Licenses: Apache-2.0
- Latest release: 0.13.3 (published almost 3 years ago)
- Last Synced: 2026-04-06T20:46:23.157Z (about 1 month ago)
- Versions: 38
- Dependent Packages: 6
- Dependent Repositories: 23
- Downloads: 15,285 Last month
- Docker Downloads: 130
-
Rankings:
- Stargazers count: 1.166%
- Docker downloads count: 1.479%
- Forks count: 1.502%
- Dependent repos count: 2.673%
- Average: 3.134%
- Dependent packages count: 4.385%
- Downloads: 7.598%
- Maintainers (4)
proxy.golang.org: github.com/huggingface/tokenizers
- Homepage:
- Documentation: https://pkg.go.dev/github.com/huggingface/tokenizers#section-documentation
- Licenses: apache-2.0
- Latest release: v0.22.2 (published 5 months ago)
- Last Synced: 2026-04-17T11:02:11.193Z (26 days ago)
- Versions: 43
- Dependent Packages: 0
- Dependent Repositories: 1
-
Rankings:
- Stargazers count: 0.809%
- Forks count: 1.187%
- Average: 3.794%
- Dependent repos count: 4.802%
- Dependent packages count: 8.376%
alpine-edge: py3-tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.21.2-r1 (published about 1 month ago)
- Last Synced: 2026-03-30T15:47:10.014Z (about 1 month ago)
- Versions: 17
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 2.289%
- Forks count: 4.25%
- Average: 5.295%
- Dependent packages count: 14.641%
- Maintainers (1)
conda-forge.org: tokenizers
- Homepage: https://pypi.org/project/tokenizers/
- Licenses: Apache-2.0
- Latest release: 0.13.1 (published over 3 years ago)
- Last Synced: 2026-04-01T16:16:59.926Z (about 1 month ago)
- Versions: 16
- Dependent Packages: 6
- Dependent Repositories: 35
- Downloads: 4,311,723 Total
-
Rankings:
- Stargazers count: 4.188%
- Dependent repos count: 6.114%
- Average: 6.559%
- Forks count: 6.898%
- Dependent packages count: 9.034%
alpine-edge: py3-tokenizers-pyc
Precompiled Python bytecode for py3-tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.21.2-r1 (published about 1 month ago)
- Last Synced: 2026-03-30T15:47:10.721Z (about 1 month ago)
- Versions: 16
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Average: 6.693%
- Dependent packages count: 13.386%
- Maintainers (1)
spack.io: py-tokenizers
Fast and Customizable Tokenizers.
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: []
- Latest release: 0.19.1 (published almost 2 years ago)
- Last Synced: 2026-03-30T20:04:00.540Z (about 1 month ago)
- Versions: 7
- Dependent Packages: 1
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Stargazers count: 1.653%
- Forks count: 3.946%
- Average: 8.417%
- Dependent packages count: 28.067%
- Maintainers (1)
anaconda.org: tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.
- Homepage: https://github.com/huggingface/tokenizers
- Licenses: Apache-2.0
- Latest release: 0.22.2 (published 5 months ago)
- Last Synced: 2026-03-17T12:04:19.130Z (about 2 months ago)
- Versions: 11
- Dependent Packages: 3
- Dependent Repositories: 35
- Downloads: 41,398 Total
-
Rankings:
- Stargazers count: 9.936%
- Forks count: 14.321%
- Average: 23.106%
- Dependent repos count: 27.231%
- Dependent packages count: 40.938%
npmjs.org: @dkhokhlov/tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
- Licenses: Apache-2.0
- Latest release: 0.22.1 (published 6 months ago)
- Last Synced: 2026-05-06T16:19:43.979Z (7 days ago)
- Versions: 2
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 41 Last month
-
Rankings:
- Dependent repos count: 23.407%
- Average: 28.584%
- Dependent packages count: 33.761%
- Maintainers (1)
pypi.org: divyanx-tokenizers
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://divyanx-tokenizers.readthedocs.io/
- Licenses: Apache Software License
- Latest release: 0.20.0.dev0 (published over 1 year ago)
- Last Synced: 2026-04-18T19:32:59.367Z (25 days ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 7 Last month
-
Rankings:
- Dependent packages count: 10.454%
- Average: 34.65%
- Dependent repos count: 58.847%
- Maintainers (1)
pypi.org: tokenizers-gt
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://tokenizers-gt.readthedocs.io/
- Licenses: Apache Software License
- Latest release: 0.15.2.post0 (published about 2 years ago)
- Last Synced: 2026-04-21T03:04:31.163Z (23 days ago)
- Versions: 3
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 352 Last month
-
Rankings:
- Dependent packages count: 10.044%
- Average: 38.708%
- Dependent repos count: 67.371%
- Maintainers (1)
npmjs.org: tokenizers-node
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
- Homepage: https://github.com/huggingface/tokenizers/tree/master/bindings/node
- Licenses: Apache-2.0
- Latest release: 0.14.2-dev0 (published about 2 years ago)
- Last Synced: 2026-05-11T18:51:16.345Z (2 days ago)
- Versions: 3
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 20 Last month
-
Rankings:
- Dependent repos count: 32.533%
- Average: 39.609%
- Dependent packages count: 46.685%
- Maintainers (1)
nixpkgs-unstable: python314Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/tokenizers/default.nix#L172
- Licenses: Apache-2.0
- Latest release: 0.22.2 (published 4 months ago)
- Last Synced: 2026-03-05T16:14:10.628Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-24.11: python312Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/tokenizers/default.nix#L158
- Licenses: Apache-2.0
- Latest release: 0.20.3 (published 3 months ago)
- Last Synced: 2026-03-06T08:37:51.109Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-24.05: python311Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/tokenizers/default.nix#L143
- Licenses: Apache-2.0
- Latest release: 0.19.1 (published 3 months ago)
- Last Synced: 2026-05-12T14:59:00.511Z (1 day ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-23.11: python310Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/tokenizers/default.nix#L140
- Licenses: Apache-2.0
- Latest release: 0.14.1 (published 3 months ago)
- Last Synced: 2026-03-07T08:38:52.218Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
nixpkgs-24.05: python312Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/tokenizers/default.nix#L143
- Licenses: Apache-2.0
- Latest release: 0.19.1 (published 3 months ago)
- Last Synced: 2026-05-13T08:00:26.010Z (about 10 hours ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-23.05: python311Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.05/pkgs/development/python-modules/tokenizers/default.nix#L142
- Licenses: Apache-2.0
- Latest release: 0.13.3 (published 4 months ago)
- Last Synced: 2026-03-05T22:33:33.640Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
nixpkgs-23.05: python310Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.05/pkgs/development/python-modules/tokenizers/default.nix#L142
- Licenses: Apache-2.0
- Latest release: 0.13.3 (published 4 months ago)
- Last Synced: 2026-04-09T22:01:19.137Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
nixpkgs-unstable: python313Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/tokenizers/default.nix#L172
- Licenses: Apache-2.0
- Latest release: 0.22.2 (published 4 months ago)
- Last Synced: 2026-04-10T19:01:51.945Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-24.11: python311Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/tokenizers/default.nix#L158
- Licenses: Apache-2.0
- Latest release: 0.20.3 (published 3 months ago)
- Last Synced: 2026-03-07T10:14:55.393Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-23.11: python311Packages.tokenizers
Fast State-of-the-Art Tokenizers optimized for Research and Production
- Homepage: https://github.com/huggingface/tokenizers
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/tokenizers/default.nix#L140
- Licenses: Apache-2.0
- Latest release: 0.14.1 (published 3 months ago)
- Last Synced: 2026-04-12T23:01:49.161Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
Dependencies
- actions-rs/toolchain v1 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/upload-artifact v2 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-node v1 composite
- actions/setup-python v1 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-node v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/setup-python v4 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/setup-python v2 composite
- actions-rs/toolchain v1 composite
- actions/cache v1 composite
- actions/checkout v1 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v1 composite
- pyo3 0.17.2 development
- tempfile 3.1 development
- env_logger 0.7.1
- itertools 0.9
- libc 0.2
- ndarray 0.13
- numpy 0.17.2
- onig 6.0
- pyo3 0.17.2
- rayon 1.3
- serde 1.0
- serde_json 1.0
- tokenizers *
- assert_approx_eq 1.1 development
- criterion 0.4 development
- tempfile 3.1 development
- aho-corasick 0.7
- cached-path 0.6
- clap 4.0
- derive_builder 0.12
- dirs 3.0
- esaxx-rs 0.1
- fancy-regex 0.10
- getrandom 0.2.6
- indicatif 0.15
- itertools 0.9
- lazy_static 1.4
- log 0.4
- macro_rules_attribute 0.1.2
- onig 6.0
- paste 1.0.6
- rand 0.8
- rayon 1.3
- rayon-cond 0.1
- regex 1.3
- regex-syntax 0.6
- reqwest 0.11
- serde 1.0
- serde_json 1.0
- spm_precompiled 0.1
- thiserror 1.0.30
- unicode-normalization-alignments 0.1
- unicode-segmentation 1.6
- unicode_categories 0.1
- wasm-bindgen-test 0.3.13 development
- console_error_panic_hook 0.1.6
- wasm-bindgen 0.2.63
- wee_alloc 0.4.5
- 627 dependencies
- @types/jest ^26.0.24 development
- @typescript-eslint/eslint-plugin ^3.10.1 development
- @typescript-eslint/parser ^3.10.1 development
- eslint ^7.32.0 development
- eslint-config-prettier ^6.15.0 development
- eslint-plugin-jest ^23.20.0 development
- eslint-plugin-jsdoc ^30.7.13 development
- eslint-plugin-prettier ^3.4.1 development
- eslint-plugin-simple-import-sort ^5.0.3 development
- jest ^26.6.3 development
- neon-cli ^0.9.1 development
- prettier ^2.5.1 development
- shelljs ^0.8.3 development
- ts-jest ^26.5.6 development
- typescript ^3.9.10 development
- @types/node ^13.13.52
- node-pre-gyp ^0.14.0
- 312 dependencies
- copy-webpack-plugin ^11.0.0 development
- webpack ^5.75.0 development
- webpack-cli ^5.0.1 development
- webpack-dev-server ^4.10.0 development
- unstable_wasm file:../pkg