Search | News by Netwrck

CS234: Reinforcement Learning Winter 2025

(web.stanford.edu) by jonbaer | view | 60 comments

TMLR: Outcome-Based Reinforcement Learning to Predict the Future

(openreview.net) by bturtel | view | 1 comments

Olympiad-level formal mathematical reasoning with reinforcement learning

(nature.com) by mauricioc | view | 0 comments

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

(github.com) by yhzan | view | 1 comments

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

(arxiv.org) by handfuloflight | view | 0 comments

An FAQ on Reinforcement Learning Environments

(epoch.ai) by dcre | view | 0 comments

Reinforcement Learning Control of Quantum Error Correction

(arxiv.org) by SweetSoftPillow | view | 0 comments

The Little Book of Reinforcement Learning

(github.com) by mustaphah | view | 0 comments

Single-Rollout Asynchronous Optimization for Agentic Reinforcement Learning

(arxiv.org) by gmays | view | 0 comments

Auditing the Risk Claims of Distributional Reinforcement Learning

(arxiv.org) by sbulaev | view | 0 comments

Capturing token IDs during agentic interaction for better reinforcement learning

(amazon.science) by visha1v | view | 0 comments

Finetuning a Reasoning LLM with Supervised or Reinforcement Learning?

(discuss.huggingface.co) by verdverm | view | 0 comments

Introduction to Reinforcement Learning and Its Role in LLMs

(huggingface.co) by tosh | view | 0 comments

Practical Lessons from Reinforcement Learning Post Training Experiments [pdf]

(zenodo.org) by luvverma2011 | view | 0 comments

Reinforcement Learning with Metacognitive Feedback Elicits Uncertainty in LLMs

(arxiv.org) by jonnonz | view | 0 comments

How Can Reinforcement Learning Achieve Expert-Level [Chip] Placement?

(arxiv.org) by Jimmc414 | view | 0 comments

Reinforcement Learning with Metacognitive Feedback

(arxiv.org) by guard0g | view | 0 comments

Reinforcement learning towards broadly and persistently beneficial models

(alignment.openai.com) by spicypete | view | 0 comments

Reinforcement learning towards broadly and persistently beneficial models

(alignment.openai.com) by gmays | view | 0 comments

Reinforcement learning towards broadly and persistently beneficial models

(alignment.openai.com) by vesteny77 | view | 0 comments

Reinforcement learning towards broadly and persistently beneficial models

(alignment.openai.com) by jawiggins | view | 0 comments

TycoonLE: A Jax reinforcement learning environment for long-horizon planning

(github.com) by vrtnis | view | 1 comments

Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data

(anjalishriva.com) by nsahu | view | 0 comments

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

(web.mit.edu) by sebzuddas | view | 0 comments

Human-Centered Reinforcement Learning [pdf]

(ml.cmu.edu) by ankitg12 | view | 0 comments

Reinforcement learning in language models recruits a functional welfare axis

(functionalwelfare.com) by paraschopra | view | 0 comments

Unix-CTF: Procedural Environments for Unix-Competence Reinforcement Learning

(twitter.com) by AMavorParker | view | 0 comments

Reinforcement Learning, in Pictures and Videos

(suriya.cc) by suriya-ganesh | view | 0 comments

Richard Sutton – Father of reinforcement learning thinks LLMs are a dead end [video]

(youtube.com) by evo_9 | view | 0 comments

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

(arxiv.org) by lexandstuff | view | 0 comments

Discovering Reinforcement Learning Interfaces with Large Language Models

(akshat-sj.github.io) by paraschopra | view | 0 comments

'Try, Score, Change': Reinforcement Learning for Children

(gwern.net) by helloplanets | view | 0 comments

Solving Physics Olympiad via reinforcement learning on physics simulators

(sim2reason.github.io) by ivansavz | view | 0 comments

Autonomous Rocket Landing with Reinforcement Learning (YouTube)

(youtube.com) by rafacm | view | 1 comments

Formalizing the "generative crash" via inverse reinforcement learning

by abrahamhaskins | view | 0 comments

Show HN: REST API for Gymnasium (fka OpenAI Gym) reinforcement learning library

(github.com) by cloudkj | view | 0 comments

What is reinforcement learning finetuning

(youtube.com) by kumama | view | 0 comments

How LLMs Got Good: Humility, Tools, and Reinforcement Learning

(medium.com) by tudorhn | view | 0 comments

Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

(dani2442.github.io) by sebzuddas | view | 0 comments

Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents

(arxiv.org) by brandonb | view | 0 comments

Rust-accelerated reinforcement learning, 140x faster than Python

(github.com) by wkowalpl | view | 1 comments

Reinforcement Learning (I.e. Policy Gradient Algorithms)

(rlhfbook.com) by vinhnx | view | 0 comments

Reinforcement Learning environments and how to build them

(unsloth.ai) by vinhnx | view | 0 comments

AI Gold Trading Bot reinforcement learning system for autonomous XAUUSD trading

(github.com) by solosquad | view | 0 comments

How Well Does Reinforcement Learning Scale?

(tobyord.com) by AntiDyatlov | view | 0 comments

A Reinforcement Learning Environment for Automatic Code Optimization in MLIR

(arxiv.org) by matt_d | view | 0 comments

Why reinforcement learning breaks at scale, and how a new method fixes it

(techxplore.com) by brandonb | view | 0 comments

Reinforcement Learning for LLMs

(mesuvash.github.io) by gmays | view | 0 comments

Notes on Reinforcement Learning

(mattlanders.net) by sato_sakura | view | 0 comments

Intuitive Intro to Reinforcement Learning for LLMs

(mesuvash.github.io) by mesuvash | view | 0 comments