Jade Abbott discusses the shift from massive, resource-heavy models to "Little LMs" that prioritize efficiency and cultural sustainability. She explains how techniques like LoRA, quantization, and GRPO allow for high performance with less compute. By sharing the "Ubuntu Punk" philosophy, she shares how to move beyond extractive data practices toward human-centric, sustainable AI systems. By Jade Abbott

infoq.com

Jade Abbott

2 days ago

Transcripts llm ai-ethics data-engineering ai mlops nlp transformers

InfoQ

Toad: A Unified CLI Tool for All Your LLMs That Promises Improved UX From Existing Ones

During his sabbatical, Will McGugan, maker of Rich and Textual( frameworks for making Textual User Interfaces (TUI)), put his UI skills to work to build Toad. The newly publicly released tool aims to provide a unified, “beautiful” GUI for multiple coding agents in your terminal, accessible via the same tool via the Agent Communication Protocol (ACP). By Olimpiu Pop

infoq.com

Olimpiu Pop

4 days ago

Agent Communication Protocol ai llm cli-tools developer-tools devops data-engineering mlops development-news

InfoQ

Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs

Decathlon, one of the world's leading sports retailers, recently shared why it adopted the open source library Polars to optimize its data pipelines. The Decathlon Digital team found that migrating from Apache Spark to Polars for small input datasets provides significant speed and cost savings. By Renato Losio

infoq.com

Renato Losio

6 days ago

Apache Spark data-engineering big-data mlops

InfoQ

Presentation: Lessons Learned From Shipping AI-Powered Healthcare Products

Clara Matos discusses the journey of shipping AI-powered healthcare products at Sword Health. She explains how to implement input/output guardrails for regulated industries and shares a framework for robust evaluations using human and LLM-based ratings. From prompt engineering to RAG and user feedback loops, she shares a data-driven roadmap for building reliable AI care agents at scale. By Clara Matos

infoq.com

Clara Matos

7 days ago

Best Practices ai ml data-engineering nlp mlops generative-ai health healthcare llm

InfoQ

Article: Architecture in a Flow of AI-Augmented Change

While AI adoption is surging, most organizations fail to scale past pilots. The solution lies in organizational structure, not just technology. This article details how architects can enable "fast flow" by defining clear domains and guardrails. Learn how to shift from controlling outcomes to curating context, allowing AI to drive continuous, valuable business change. By Jonathan McPhail, Juan Medina, Jake DeCrane, Isuru Wijesundara

infoq.com

Jonathan McPhail, Juan Medina, Jake DeCrane, Isuru Wijesundara

9 days ago

Architecture ai architecture mlops ai-ethics data-engineering system-design

InfoQ

QCon AI New York 2025: Moving Mountains: Migrating Legacy Code in Weeks Instead of Years

David Stein, Principal AI Engineer at ServiceTitan, presented “Moving Mountains: Migrating Legacy Code in Weeks instead of Years” at QCon AI New York 2025. Stein demonstrated how migrations don’t have to be synonymous to “moving mountains” and introduced the concepts of the Principle of Acceleration and the Assembly Line Pattern. By Michael Redlich

infoq.com

Michael Redlich

9 days ago

Legacy Code ai software-migration system-architecture data-engineering system-design software-engineering system-migration legacy-code software-architecture migration software-development architecture

InfoQ

Article: NextGen Search - Where AI Meets OpenSearch Through MCP

In this article, authors Srikanth Daggumalli and Arun Lakshmanan discuss next-generation context-aware conversational search using OpenSearch and AI agents powered by Large Language Models (LLMs) and Model Context Protocol (MCP). By Srikanth Daggumalli, Arun Lakshmanan

infoq.com

Srikanth Daggumalli, Arun Lakshmanan

10 days ago

Artificial Intelligence ai llm open-search nlp vector-search data-engineering knowledge-graphs

InfoQ

TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java

The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. This release is likely to be of particular interest to teams developing LLM solutions on the JVM. By Ben Evans

infoq.com

Ben Evans

10 days ago

GPU ai llm java mlops gpu data-engineering

InfoQ

Presentation: Powering Enterprise AI Applications with Data and Open Source Software

Francisco Javier Arceo explored Feast, the open-source feature store designed to address common data challenges in the AI/ML lifecycle, such as feature redundancy, and low-latency serving at scale. By Francisco Javier Arceo

infoq.com

Francisco Javier Arceo

11 days ago

Transcripts ai mlops data-engineering feature-store open-source feature-engineering

Top posts from tech subreddits• Updated about 1 hour ago

China's MGI data mining tech touts world's fastest gene sequencer: 10 minutes to read a genome and process more than 14 terabases of genomic data a day

yahoo.com

rtbot2

4 months ago

r/realtech data-science genomics data-engineering bioinformatics

EU fines Google $3.5 billion for abusing dominance in digital ad market

techspot.com

rtbot2

4 months ago

r/realtech regulation ai ai-ethics webdev privacy big-data data-engineering

[Project] Otters 🦦 - A minimal vector search library with powerful metadata filtering

reddit.com

AtharvBhat

4 months ago

r/MachineLearning rag knowledge-graphs vector-search data-engineering embeddings

[D] How to Automate parsing of Bank Statement PDFs to extract transaction level data

reddit.com

Anmol_garwal

4 months ago

r/MachineLearning automation pdf data-engineering nlp

HF releases 3T tokens dataset sourced entirely from PDFs.

reddit.com

343

Other_Housing8453

4 months ago

r/LocalLLaMA data-science data-engineering nlp

What’s the biggest data governance challenge you face when building cross-agent pipelines?

reddit.com

rwitt101

4 months ago

r/AI_Agents data-lakes etl data-engineering

Business Rules In Database Movement

medium.com

vbilopav89

4 months ago

r/programming databases system-design etl data-engineering

[p] Why per row context understanding is important for data transformations and here's how you can use LLMs to do so

reddit.com

metalvendetta

4 months ago

r/MachineLearning data-science llm transformers data-engineering nlp

European Commission fines Google €2.95 billion over abusive practices in online advertising technology

ec.europa.eu

rtbot2

4 months ago

r/realtech ai ai-ethics webdev privacy big-data data-engineering

1182050

Hugging Face Trending

Popular models from Hugging Face• Updated 42 minutes ago

No models found

Try removing the tag filter or searching for different content.

GitHub Trending

Popular repositories from GitHub• Updated about 1 hour ago

juicedata

juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

12,591

1,132

distributed-systems storage cloud data-engineering object-storage

pathwaycom

pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Python

51,176

1,482

python etl llm rag data-engineering

StarRocks

starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java

11,148

2,240

big-data data-engineering java databases sql analytics system-design data-lakes

ClickHouse

ClickHouse® is a real-time analytics database management system

C++

44,853

7,923

databases sql clickhouse c++analytics cpp performance data-engineering data-analytics real-time system-design

rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

54,618

4,828

cloud data-engineering self-hosting storage data-lakes rclone

ranaroussi

yfinance

Download market data from Yahoo! Finance's API

Python

20,318

2,954

python api data-science data-engineering financial-data webdev

PostHog

posthog

🦔 PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack.

Python

30,534

2,142

open-source self-hosting analytics python webdev web-analytics self-hosted product-analytics data-engineering developer-tools