Decathlon, one of the world's leading sports retailers, recently shared why it adopted the open source library Polars to optimize its data pipelines. The Decathlon Digital team found that migrating from Apache Spark to Polars for small input datasets provides significant speed and cost savings. By Renato Losio

infoq.com

Renato Losio

about 15 hours ago

Apache Spark data-engineering big-data mlops

InfoQ

Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale

In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume. By Craig Risi

infoq.com

Craig Risi

8 days ago

S3 cloud devops big-data data-engineering aws system-design

Top posts from tech subreddits• Updated about 3 hours ago

OpenAl made deals to purchase ~40% of the global raw, undiced DRAM wafer output until 2029

openai.com

rtbot2

about 6 hours ago

r/realtech ai ai-ethics big-data ai-research data-engineering generative-ai

OpenAI's Stargate project to consume up to 40% of global DRAM output — inks deal with Samsung and SK hynix to the tune of up to 900,000 wafers per month

tomshardware.com

411

Automaticalee

about 5 hours ago

r/technology ai cloud big-data

OpenAl made deals to purchase ~40% of the global raw, undiced DRAM wafer output until 2029

openai.com

3021

289

Ha8lpo321

about 7 hours ago

r/technology ai ai-ethics cloud big-data ai-research generative-ai

Would You Trust a 22-Year-Old AI Billionaire With the Global Economy?

theatlantic.com

theatlantic

about 10 hours ago

r/artificial tech-news ai technology-news ai-ethics big-data ai-research scalability

For Anyone Looking for Financial Data APIs

reddit.com

Real_Grapefruit_5570

about 11 hours ago

r/webdev api data-science data-lakes fintech big-data data-apis data-engineering

Feds pave the way for Big Tech to plug data centers right into power plants in scramble for energy

apnews.com

rtbot2

about 23 hours ago

r/realtech cloud big-data data-engineering

Data center deals hit record $61 billion in 2025 amid construction frenzy

cnbc.com

rtbot2

1 day ago

r/realtech data-center cloud big-data system-design

‘Uniquely evil’: Michigan residents fight against huge data center backed by top tycoons

theguardian.com

5021

309

zsreport

3 days ago

r/technology cloud privacy big-data data-engineering

Judge dismisses content moderation suit against Google, TikTok

courthousenews.com

rtbot2

2 days ago

r/realtech ai ai-ethics webdev content-moderation law big-data web2 legal

258

Hugging Face Trending

Popular models from Hugging Face• Updated 38 minutes ago

No models found

Try removing the tag filter or searching for different content.

GitHub Trending

Popular repositories from GitHub• Updated about 1 hour ago

apache

doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

Java

14,764

3,653

databases sql apache performance java analytics data-engineering data-warehousing big-data apache-doris data-analytics database-tuning

apache

flink-cdc

Flink CDC is a streaming data integration tool

Java

6,306

2,110

big-data apache-flink apache data-engineering streaming flink

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

C++

22,960

1,191

performance big-data c++data-engineering json cpp databases

apache

arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++

16,257

3,938

data-engineering big-data columnar-databases databases apache apache-arrow data-lakes etl

StarRocks

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java

11,064

2,227

big-data data-engineering java databases sql analytics system-design data-lakes

perspective-dev

perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

C++

10,002

1,263

data-visualization data-engineering big-data

lancedb

lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Rust

5,537

453

data-engineering rust vector-databases databases vector-search big-data