Reinforcement learning is a sub-field of mathematics and computer science. It deals with the following kind of problems: You're given a set of states \(\mathcal{X} \subseteq \mathbb{R}^n\) and a starting state \(x_0 \in \mathcal{X}\). For every time step \(k = 0, 1, 2, \dots\) you have a set of possible actions, depending on your current state:
Depending on what your action and your current state is, the new state is
So the transition from state \(x_k\) to state \(x_{k+1}\) with action \(a_k\) is stochastic.
For some states \(x_k\), actions \(a_k\) at time \(k\), you receive rewards:
Your goal is to maximize
where \(\gamma in (0, 1)\) is a discounting factor which makes sure we don't get infinite rewards. \(\gamma = 0.99\) is a typical choice.
Applications
This very general problem description can be applied in almost any scenario:
- Learning automatically to play games
- Learning to control robots
Why RL is difficult
- Credit assignment: In chess, you only get a reward (positiv or negative) at the end of the game. How to you tell which move was good or bad?
- Exploration vs. exploitation: When should you stick to what you know and when should you try something new?
- State equivalence: Typically, your state is very high-dimensional. For example
when learning very old computer games from raw pixels you have
$$210 \cdot 160 \cdot 3 = 100800$$dimensions in your feature vector. But the relevant game states might be much less.
Approaches
A policy network gets the state as input and outputs the action. It learns by executing many episodes (e.g. a complete game; from start until you reach a final state or at least a state with reward) and labels all actions before with the received reward. There might be many which were good even in a lost game, but in average you expect to punish bad decisions and encourage good decisions.
Resources
If you are a student at KIT, I can recommend to visit the lecture Probabilistic Planning.
Other resources you might want to have a look at: