Trending

Content tagged with "data-engineering"

data-engineering

Hacker News

Top stories from the Hacker News community• Updated 1 minute ago

InfoQ

Latest articles from InfoQ• Updated 16 minutes ago

InfoQ

Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale

In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume. By Craig Risi

infoq.com
InfoQ

Magika 1.0: Smarter, Faster File Detection with Rust and AI

Google has just released version 1.0 of Magika, a substantial rewrite of its open-source file type detection system. The new version leverages AI to support a broader range of file types and is built in Rust for maximum speed and security. By Sergio De Simone

infoq.com
InfoQ

Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency Across Content Engineering

Netflix has introduced the Upper metamodel within its Unified Data Architecture (UDA) to standardize domain definitions and generate consistent data container representations. UDA links conceptual models to GraphQL, Avro, SQL, and Java artifacts, supporting projections, mappings, and knowledge graph-based discovery across content, advertising, and operational systems. By Leela Kumili

infoq.com
InfoQ

Learnings from Cultivating Machine Learning Engineers as a Team Manager

As an AI team manager, Vivek Gupta stays broadly informed to guide AI experts effectively and drive the team. Engineers need feedback on both technical and interpersonal skills, Gupta mentioned at Dev Summit Boston. He stresses learning time, asking for help, and cross-team collaboration. Mentorship, data handling, and human-in-the-loop validation are key to success for machine learning engineers. By Ben Linders

infoq.com
InfoQ

Agentic Postgres: Postgres for Agentic Apps with Fast Forking and AI-Ready Features

Tiger Data, the company behind TimescaleDB, has launched Agentic Postgres, a Postgres-based database designed for both AI agents and developers. It extends Postgres with fast forking, an MCP server, native BM25 and vector search, and includes a CLI for terminal access. By Sergio De Simone

infoq.com

Reddit

Top posts from tech subreddits• Updated 16 minutes ago

Hugging Face Trending

Popular models from Hugging Face• Updated about 1 hour ago

No models found

Try removing the tag filter or searching for different content.

GitHub Trending

Popular repositories from GitHub• Updated 12 minutes ago

mindsdb

Federated query engine for AI - The only MCP Server you'll ever need

Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

chroma

Open-source search and retrieval database for AI applications.

presidio

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

superset

Apache Superset is a Data Visualization and Data Exploration Platform

datahub

The Metadata Platform for your Data and AI Stack

airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows