Trending
Content tagged with "data-engineering"
Hacker News
Top stories from the Hacker News community• Updated 9 minutes ago
Top posts from tech subreddits• Updated 5 minutes ago
Hugging Face Trending
Popular models from Hugging Face• Updated 5 minutes ago
No models found
Try removing the tag filter or searching for different content.
GitHub Trending
Popular repositories from GitHub• Updated 20 minutes ago
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
milvus
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
llama_index
LlamaIndex is the leading framework for building LLM-powered agents over your data.