At QCon AI NYC 2025, Will Hang from OpenAI unveiled Agent RFT—a cutting-edge reinforcement fine-tuning approach for tool-using agents. By optimizing prompts and tasks before model adjustments, Hang showcased effective strategies to enhance decision-making and efficiency, emphasizing a balanced grading system. The session revealed a future where smarter agents reduce latency and improve outcomes. By Andrew Hoblitzell

infoq.com

Andrew Hoblitzell

about 19 hours ago

QCon Software Development Conference ai reinforcement-learning model-serving mlops model-tuning

Top posts from tech subreddits• Updated 4 minutes ago

The performance of Minimax-m2 is truly impressive!

i.redd.it

166

contportvas

about 2 months ago

r/LocalLLaMA ai reinforcement-learning performance

A new autonomous fighter jet just broke cover. It's powered by the same AI brain that flew an F-16 through a dogfight.

businessinsider.com

437

MetaKnowing

about 2 months ago

r/tech ai ai-ethics reinforcement-learning ai-research military-ai deep-learning mlops autonomous-systems

[P] CleanMARL : a clean implementations of Multi-Agent Reinforcement Learning Algorithms in PyTorch

reddit.com

AmineZ04

2 months ago

r/MachineLearning reinforcement-learning multi-agent pytorch

[R] No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

reddit.com

SnooHesitations8849

3 months ago

r/MachineLearning ai ai-ethics llm reinforcement-learning

Richard Sutton – Father of RL thinks LLMs are a dead end

youtube.com

creaturefeature16

3 months ago

r/artificial ai llm reinforcement-learning mlops

Gpt-oss Reinforcement Learning - Fastest inference now in Unsloth! (<15GB VRAM)

i.redd.it

281

danielhanchen

3 months ago

r/LocalLLaMA ai llm reinforcement-learning model-serving

[P] SDLArch-RL: Multi-Console Gaming Environment for Reinforcement Learning Research

youtube.com

AgeOfEmpires4AOE4

3 months ago

r/MachineLearning gaming reinforcement-learning ai-research game-engines system-design research

[R] MiniGrid DoorKeys Benchmark Active Inference

reddit.com

thomheinrich

3 months ago

r/MachineLearning ai simulation reinforcement-learning benchmark generative-ai

[R] r-rpe: beyond openai’s rl-hf — hedging ↓60% in eval-only tests

reddit.com

chicken1414

3 months ago

r/MachineLearning ai reinforcement-learning model-serving ai-research mlops openai

1317

Hugging Face Trending

Popular models from Hugging Face• Updated 40 minutes ago

RL ReSearch

DR-Tulu-8B

724

reinforcement-learning ai-research deep-learning

Xiaomi MiMo

MiMo-7B-RL

245

5,976

ai reinforcement-learning hardware

GitHub Trending

Popular repositories from GitHub• Updated about 1 hour ago

DLR-RM

stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Python

12,326

2,016

reinforcement-learning pytorch python deep-learning ai mlops

vwxyzjn

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python

8,485

922

reinforcement-learning python deep-learning ai

facebookresearch

habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

Python

2,660

596

python nlp ai deep-learning robotics reinforcement-learning

microsoft

qlib

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.

Python

33,321

5,132

ai data-science reinforcement-learning mlops quantitative-finance quantitative-investment quantitative-investing deep-learning quantitative-research model-serving