https://github.com/Unstructured-IO/unstructured
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Score: 24.60545022578863
Last synced: about 10 hours ago
JSON representation
Repository metadata:
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- Host: GitHub
- URL: https://github.com/Unstructured-IO/unstructured
- Owner: Unstructured-IO
- License: apache-2.0
- Created: 2022-09-26T21:53:41.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2026-06-09T02:04:18.000Z (7 days ago)
- Last Synced: 2026-06-09T04:07:29.406Z (7 days ago)
- Topics: data-pipelines, deep-learning, document-image-analysis, document-image-processing, document-parser, document-parsing, docx, donut, information-retrieval, langchain, llm, machine-learning, ml, natural-language-processing, nlp, ocr, pdf, pdf-to-json, pdf-to-text, preprocessing
- Language: HTML
- Homepage: https://www.unstructured.io/
- Size: 225 MB
- Stars: 14,859
- Watchers: 72
- Forks: 1,250
- Open Issues: 250
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Package metadata
- Total packages: 12
-
Total downloads:
- pypi: 3,200,580 last-month
- Total docker downloads: 7,856
- Total dependent packages: 113 (may contain duplicates)
- Total dependent repositories: 3,374 (may contain duplicates)
- Total versions: 243
- Total maintainers: 3
- Total advisories: 2
pypi.org: unstructured
A library that prepares raw documents for downstream ML tasks.
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://unstructured.readthedocs.io/
- Licenses: Apache-2.0
- Latest release: 0.18.24 (published 6 months ago)
- Last Synced: 2026-06-03T14:49:52.479Z (13 days ago)
- Versions: 197
- Dependent Packages: 113
- Dependent Repositories: 3,374
- Downloads: 3,198,760 Last month
- Docker Downloads: 7,856
-
Rankings:
- Dependent repos count: 0.178%
- Dependent packages count: 0.221%
- Downloads: 0.416%
- Stargazers count: 1.352%
- Average: 1.482%
- Docker downloads count: 3.322%
- Forks count: 3.403%
- Maintainers (1)
- Advisories:
proxy.golang.org: github.com/Unstructured-IO/unstructured
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://pkg.go.dev/github.com/Unstructured-IO/unstructured#section-documentation
- Licenses: Apache-2.0
- Latest release: v0.0.0-20260424172226-879e1269b56f (published about 2 months ago)
- Last Synced: 2026-05-25T15:54:29.926Z (22 days ago)
- Versions: 11
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Stargazers count: 0.661%
- Forks count: 1.14%
- Average: 3.722%
- Dependent packages count: 6.329%
- Dependent repos count: 6.756%
proxy.golang.org: github.com/unstructured-io/unstructured
- Homepage: https://github.com/unstructured-io/unstructured
- Documentation: https://pkg.go.dev/github.com/unstructured-io/unstructured#section-documentation
- Licenses: Apache-2.0
- Latest release: v0.0.0-20260424172226-879e1269b56f (published about 2 months ago)
- Last Synced: 2026-04-24T20:07:49.769Z (about 2 months ago)
- Versions: 12
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent packages count: 6.338%
- Average: 6.552%
- Dependent repos count: 6.766%
pypi.org: unstructured-cpu
A library that prepares raw documents for downstream ML tasks.
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://unstructured-cpu.readthedocs.io/
- Licenses: Apache-2.0
- Latest release: 0.15.1 (published almost 2 years ago)
- Last Synced: 2026-06-09T14:44:12.103Z (7 days ago)
- Versions: 13
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 1,820 Last month
-
Rankings:
- Dependent packages count: 10.51%
- Average: 34.844%
- Dependent repos count: 59.178%
- Maintainers (1)
nixpkgs-23.11: python310Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/unstructured/default.nix#L139
- Licenses: Apache-2.0
- Latest release: 0.10.30 (published 4 months ago)
- Last Synced: 2026-03-07T08:41:10.013Z (3 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-unstable: python314Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/unstructured/default.nix#L280
- Licenses: Apache-2.0
- Latest release: 0.18.28 (published 3 months ago)
- Last Synced: 2026-03-07T10:20:11.407Z (3 months ago)
- Versions: 2
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-24.05: python312Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/unstructured/default.nix#L149
- Licenses: Apache-2.0
- Latest release: 0.13.7 (published 4 months ago)
- Last Synced: 2026-05-13T06:08:22.375Z (about 1 month ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-24.11: python311Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/unstructured/default.nix#L149
- Licenses: Apache-2.0
- Latest release: 0.15.14 (published 4 months ago)
- Last Synced: 2026-03-07T12:16:26.865Z (3 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-unstable: python313Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/unstructured/default.nix#L280
- Licenses: Apache-2.0
- Latest release: 0.18.28 (published 3 months ago)
- Last Synced: 2026-04-17T20:06:48.172Z (about 2 months ago)
- Versions: 2
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-23.11: python311Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-23.11/pkgs/development/python-modules/unstructured/default.nix#L139
- Licenses: Apache-2.0
- Latest release: 0.10.30 (published 4 months ago)
- Last Synced: 2026-04-12T23:01:50.286Z (2 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)
nixpkgs-24.11: python312Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.11/pkgs/development/python-modules/unstructured/default.nix#L149
- Licenses: Apache-2.0
- Latest release: 0.15.14 (published 4 months ago)
- Last Synced: 2026-03-06T09:46:47.974Z (3 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Maintainers (1)
nixpkgs-24.05: python311Packages.unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines
- Homepage: https://github.com/Unstructured-IO/unstructured
- Documentation: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/development/python-modules/unstructured/default.nix#L149
- Licenses: Apache-2.0
- Latest release: 0.13.7 (published 4 months ago)
- Last Synced: 2026-03-09T04:00:50.329Z (3 months ago)
- Versions: 1
- Dependent Packages: 0
- Dependent Repositories: 0
-
Rankings:
- Dependent repos count: 0.0%
- Dependent packages count: 0.0%
- Average: 100%
- Maintainers (1)