Trending
Content tagged with "data-engineering"
Hacker News
Top stories from the Hacker News community• Updated 1 minute ago
InfoQ
Latest articles from InfoQ• Updated 16 minutes ago
Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale
In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume. By Craig Risi
Magika 1.0: Smarter, Faster File Detection with Rust and AI
Google has just released version 1.0 of Magika, a substantial rewrite of its open-source file type detection system. The new version leverages AI to support a broader range of file types and is built in Rust for maximum speed and security. By Sergio De Simone
Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency Across Content Engineering
Netflix has introduced the Upper metamodel within its Unified Data Architecture (UDA) to standardize domain definitions and generate consistent data container representations. UDA links conceptual models to GraphQL, Avro, SQL, and Java artifacts, supporting projections, mappings, and knowledge graph-based discovery across content, advertising, and operational systems. By Leela Kumili
Learnings from Cultivating Machine Learning Engineers as a Team Manager
As an AI team manager, Vivek Gupta stays broadly informed to guide AI experts effectively and drive the team. Engineers need feedback on both technical and interpersonal skills, Gupta mentioned at Dev Summit Boston. He stresses learning time, asking for help, and cross-team collaboration. Mentorship, data handling, and human-in-the-loop validation are key to success for machine learning engineers. By Ben Linders
Agentic Postgres: Postgres for Agentic Apps with Fast Forking and AI-Ready Features
Tiger Data, the company behind TimescaleDB, has launched Agentic Postgres, a Postgres-based database designed for both AI agents and developers. It extends Postgres with fast forking, an MCP server, native BM25 and vector search, and includes a CLI for terminal access. By Sergio De Simone
Top posts from tech subreddits• Updated 16 minutes ago
We loaded 4,027 tools into Anthropic’s new Tool Search. It got ~60% right. Here’s the full breakdown.
Hugging Face Trending
Popular models from Hugging Face• Updated about 1 hour ago
No models found
Try removing the tag filter or searching for different content.
GitHub Trending
Popular repositories from GitHub• Updated 12 minutes ago
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
presidio
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows