Trending
Content tagged with "big-data"
Hacker News
Top stories from the Hacker News community• Updated 5 minutes ago
650GB of Data (Delta Lake on S3). Polars vs. DuckDB vs. Daft vs. Spark
Top posts from tech subreddits• Updated 5 minutes ago
Judge who ruled Google is a monopoly decides to do hardly anything to break it up
Judge who ruled Google is a monopoly decides to do hardly anything to break it up
Leaked plan from Trump administration to make depopulated Gaza a high-tech cash cow
Hugging Face Trending
Popular models from Hugging Face• Updated about 1 hour ago
No models found
Try removing the tag filter or searching for different content.
GitHub Trending
Popular repositories from GitHub• Updated 2 minutes ago
perspective
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.