Trending

Content tagged with "data-engineering"

data-engineering

Hacker News

Top stories from the Hacker News community• Updated 1 minute ago

Reddit

Top posts from tech subreddits• Updated 1 minute ago

Hugging Face Trending

Popular models from Hugging Face• Updated 43 minutes ago

No models found

Try removing the tag filter or searching for different content.

GitHub Trending

Popular repositories from GitHub• Updated about 1 hour ago

unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

faiss

A library for efficient similarity search and clustering of dense vectors.

rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

warp

A Python framework for accelerated simulation, data generation and spatial computing.

yfinance

Download market data from Yahoo! Finance's API

dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

dbt-utils

Utility functions for dbt projects.

Makefile
1,616
556

pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics