Trending

Content tagged with "big-data"

big-data

Hacker News

Top stories from the Hacker News community• Updated 1 minute ago

InfoQ

Latest articles from InfoQ• Updated 16 minutes ago

InfoQ

Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale

In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume. By Craig Risi

infoq.com

Reddit

Top posts from tech subreddits• Updated 16 minutes ago

2
1
financialtimes
3 days ago

Hugging Face Trending

Popular models from Hugging Face• Updated about 1 hour ago

No models found

Try removing the tag filter or searching for different content.

GitHub Trending

Popular repositories from GitHub• Updated 12 minutes ago

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

2