Trending
Content tagged with "data-engineering"
Hacker News
Top stories from the Hacker News community• Updated 1 minute ago
Top posts from tech subreddits• Updated 1 minute ago
Jet engine shortages threaten AI data center expansion as wait times stretch into 2030
Hugging Face Trending
Popular models from Hugging Face• Updated 43 minutes ago
No models found
Try removing the tag filter or searching for different content.
GitHub Trending
Popular repositories from GitHub• Updated about 1 hour ago
unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics