An open API service for producing an overview of a list of open source projects.

https://github.com/Unstructured-IO/unstructured

data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing

Score: 24.60545022578863

Last synced: about 10 hours ago
JSON representation

Repository metadata:

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.


Package metadata

pypi.org: unstructured

A library that prepares raw documents for downstream ML tasks.

proxy.golang.org: github.com/Unstructured-IO/unstructured

  • Homepage: https://github.com/Unstructured-IO/unstructured
  • Documentation: https://pkg.go.dev/github.com/Unstructured-IO/unstructured#section-documentation
  • Licenses: Apache-2.0
  • Latest release: v0.0.0-20260424172226-879e1269b56f (published about 2 months ago)
  • Last Synced: 2026-05-25T15:54:29.926Z (22 days ago)
  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Stargazers count: 0.661%
    • Forks count: 1.14%
    • Average: 3.722%
    • Dependent packages count: 6.329%
    • Dependent repos count: 6.756%
proxy.golang.org: github.com/unstructured-io/unstructured

  • Homepage: https://github.com/unstructured-io/unstructured
  • Documentation: https://pkg.go.dev/github.com/unstructured-io/unstructured#section-documentation
  • Licenses: Apache-2.0
  • Latest release: v0.0.0-20260424172226-879e1269b56f (published about 2 months ago)
  • Last Synced: 2026-04-24T20:07:49.769Z (about 2 months ago)
  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Rankings:
    • Dependent packages count: 6.338%
    • Average: 6.552%
    • Dependent repos count: 6.766%
pypi.org: unstructured-cpu

A library that prepares raw documents for downstream ML tasks.

  • Homepage: https://github.com/Unstructured-IO/unstructured
  • Documentation: https://unstructured-cpu.readthedocs.io/
  • Licenses: Apache-2.0
  • Latest release: 0.15.1 (published almost 2 years ago)
  • Last Synced: 2026-06-09T14:44:12.103Z (7 days ago)
  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,820 Last month
  • Rankings:
    • Dependent packages count: 10.51%
    • Average: 34.844%
    • Dependent repos count: 59.178%
  • Maintainers (1)
nixpkgs-23.11: python310Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-unstable: python314Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-24.05: python312Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-24.11: python311Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-unstable: python313Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-23.11: python311Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-24.11: python312Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines

nixpkgs-24.05: python311Packages.unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines