awesome-llama: https://github.com/datajuicer/data-juicer
data data-analysis data-pipeline data-processing data-science data-visualization foundation-models instruction-tuning large-language-models llm llms multi-modal pre-training synthetic-data
Score: 19.590012971866962
Last synced: about 9 hours ago
JSON representation
Repository metadata:
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
- Host: GitHub
- URL: https://github.com/datajuicer/data-juicer
- Owner: datajuicer
- License: apache-2.0
- Created: 2023-08-01T09:16:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2026-01-12T12:54:48.000Z (23 days ago)
- Last Synced: 2026-01-12T17:59:31.067Z (23 days ago)
- Topics: data, data-analysis, data-pipeline, data-processing, data-science, data-visualization, foundation-models, instruction-tuning, large-language-models, llm, llms, multi-modal, pre-training, synthetic-data
- Language: Python
- Homepage: https://datajuicer.github.io/data-juicer/
- Size: 783 MB
- Stars: 5,728
- Watchers: 19
- Forks: 313
- Open Issues: 61
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Owner metadata:
- Name: DataJuicer
- Login: datajuicer
- Email: datajuicer@outlook.com
- Kind: organization
- Description: Data processing for and with large models.
- Website:
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/223222708?v=4
- Repositories: 1
- Last Synced at: 2025-11-05T09:13:59.618Z
- Profile URL: https://github.com/datajuicer
GitHub Events
Total
- Create event: 10
- Delete event: 8
- Fork event: 6
- Issue comment event: 30
- Issues event: 17
- Member event: 1
- Pull request event: 38
- Pull request review comment event: 65
- Pull request review event: 41
- Push event: 111
- Watch event: 78
- Total: 405
Last Year
- Create event: 10
- Delete event: 8
- Fork event: 6
- Issue comment event: 30
- Issues event: 17
- Member event: 1
- Pull request event: 38
- Pull request review comment event: 65
- Pull request review event: 41
- Push event: 111
- Watch event: 78
- Total: 405
Committers metadata
Last synced: 17 days ago
Total Commits: 500
Total Committers: 36
Avg Commits per committer: 13.889
Development Distribution Score (DDS): 0.706
Commits in past year: 207
Committers in past year: 21
Avg Commits per committer in past year: 9.857
Development Distribution Score (DDS) in past year: 0.671
| Name | Commits | |
|---|---|---|
| Yilun Huang | l****l@a****m | 147 |
| BeachWang | 1****7@p****n | 48 |
| Daoyuan Chen | 6****c | 47 |
| Cathy0908 | 3****8 | 39 |
| Ce Ge (戈策) | g****e@f****m | 35 |
| zhijianma | z****j@a****m | 30 |
| cmgzn | 8****n | 27 |
| garyzhang99 | 4****9 | 17 |
| chenhesen | h****s@a****m | 12 |
| Xuchen Pan | 3****c | 12 |
| Cyrus Zhang | c****g@g****m | 12 |
| Yuhan Liu | 3****x | 10 |
| co63oc | c****c | 10 |
| Qirui-jiao | 1****o | 9 |
| Zhen Qin | z****n@g****m | 7 |
| chenyushuo | 2****6@q****m | 5 |
| Xinyu Zhang | 6****h | 4 |
| kyotom | 3****m | 4 |
| John Giorgi | j****i@g****m | 3 |
| lingzhq | 1****q | 3 |
| 2108038773 | 1****3 | 2 |
| JamieYu | y****a@f****m | 2 |
| Yuexiang XIE | y****x@a****m | 2 |
| weijie | 3****o | 1 |
| simplaj | 3****j | 1 |
| seanzhang-zhichen | 7****n | 1 |
| ricksun2023 | 1****3 | 1 |
| panghu | 5****i | 1 |
| jackylee | q****1@g****m | 1 |
| Yanyi Liu | w****u@1****m | 1 |
| and 6 more... | ||
Issue and Pull Request metadata
Last synced: about 1 month ago
Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull request: 0
Bot issues: 0
Bot pull requests: 0
Past year issues: 0
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 0
Past year pull request authors: 0
Past year average comments per issue: 0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
Top Pull Request Authors
Top Issue Labels
Top Pull Request Labels
Package metadata
- Total packages: 1
-
Total downloads:
- pypi: 1,544 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 23
- Total maintainers: 1
pypi.org: py-data-juicer
Data Processing for and with Foundation Models.
- Homepage:
- Documentation: https://py-data-juicer.readthedocs.io/
- Licenses: Apache-2.0
- Latest release: 1.4.4 (published 2 months ago)
- Last Synced: 2026-01-09T02:10:31.395Z (26 days ago)
- Versions: 23
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 1,544 Last month
-
Rankings:
- Stargazers count: 7.067%
- Dependent packages count: 7.382%
- Forks count: 11.988%
- Downloads: 16.887%
- Average: 22.447%
- Dependent repos count: 68.91%
- Maintainers (1)