21 Reinforcement Learning (RL) Concepts Explained Simply

#130: Finally understand RL without getting confused by its fancy math equations

Mar 14, 2026

∙ Paid

Share this post & I'll send you some rewards for the referrals.

Reinforcement learning (RL) is making a comeback and becoming mainstream.

From humanoid robots and game-playing AIs that beat world champions to the LLMs you chat with every day, all are trained using RL, which lets them learn from experience and get better through feedback.

Contrary to many tutorials that can make it look so hard, RL is actually pretty intuitive, and you really don’t need a PhD to understand how it works.

In this newsletter, we will go through 21 key RL concepts to build a solid foundation from the ground up. These terms are all explained in plain language without using any fancy math equations.

Let’s start from the very beginning…

Want to reduce AI coding bugs before they hit production? (Partner)

AI is writing more production code than ever, but it also introduces more risk.

CodeRabbit’s research found that AI-generated code contains 1.7x more issues, including 75% more logic errors.

That’s exactly why you should grab a copy of the Ultimate Prompting Cheat Sheet.

Here’s what you get:

Proven prompting techniques to reduce AI coding errors.
Actionable workflows for AI-assisted development.
Clear frameworks for safer AI-generated code.
Practical examples you can apply immediately.

And many more!

These techniques reduce defects without slowing velocity, regardless of which AI coding tool you use.

GET INSTANT ACCESS

(Thanks to CodeRabbit for partnering on this post & sharing the ultimate prompting cheat sheet.)

I want to reintroduce Ashish Bamania as a guest author.

He’s a self-taught software engineer and an emergency physician. He is also the editor and primary author of the newsletters Into AI and Into Quantum.

The images used in this newsletter come from his books:

LLMs In 100 Images

Reinforcement Learning in 100 Images (upcoming)

Grab the book at 20% discount today with the discount code NEO20, pre-applied to the link above!

1. What is Reinforcement learning?

Let’s start with the definition of Reinforcement Learning (RL).

RL is a type of machine learning (besides Supervised and Unsupervised learning) that deals with an entity called an ‘Agent’, trying to learn to perform a task better in its ‘Environment’ through trial and error.

For example, a deer (Agent) foraging in a forest (Environment) to survive, avoiding being eaten by its predators.

We will soon move to more AI-related examples, I promise, but first, let’s better understand what the terms ‘Agent’ and ‘Environment’ mean.

2. Agent

An agent is the central entity in RL.

It is one that studies and interacts with its environment, makes decisions, takes actions, and learns from its outcomes.

In our previous case, we used a deer as an agent example, but from now on, we will discuss concepts assuming an LLM as the agent as well.

3. Environment

The environment is everything outside the agent with which it interacts.

The function of an environment is to:

Get affected by an agent’s actions
Change its state (or maybe hold its state constant) depending on the agent’s actions
Give the agent a reward or punishment based on its actions, so that the agent can modify its intent the next time it takes an action

In the first case, the deer was functioning in a forest as its environment. Based on the deer’s actions (moving around in different directions), either the forest (environment) lets it progress towards food (reward) or it exposes it to a cheetah (punishment).

In the case of an LLM as an agent, everything outside it, such as the following, is part of its environment:

User inputs
System prompt
Tools / APIs it can call
System responses (tool results, API outputs, error messages)
Context (documents, conversation history, files)

4. State

A State is a snapshot of an environment at a time. It is everything an agent sees at a given moment that it can use to make its next decision.

For a deer agent, it could be its current location in the forest, nearby predators, and the time of day.

For an LLM agent, it is all the context it has access to at a moment, which it can use to decide its next action.

But what’s action?

5. Action

Keep reading with a 7-day free trial

Subscribe to The System Design Newsletter to keep reading this post and get 7 days of free access to the full post archives.

A guest post by

Dr. Ashish Bamania

Author of ‘Into AI’ → a bestselling newsletter helping engineers become 100× better in AI | Ex-CTO