Reinforcement learning is a sub-field of mathematics and computer science. It deals with the following kind of problems: You're given a set of states $\mathcal{X} \subseteq \mathbb{R}^n$ and a starting state $x_0 \in \mathcal{X}$. For every time step $k = 0, 1, 2, \dots$ you have a set of possible actions, depending on your current state: $$\mathcal{A}k(x_k)$$ Depending on what your action and your current state is, the new state is $$P(x_k, a_k, x) \in [0, 1]$$ So the transition from state $x_k$ to state $x_{k+1}$ with action $a_k$ is stochastic.
For some states $x_k$, actions $a_k$ at time $k$, you receive rewards:
$$r_k(x_k, a_k) \in \mathbb{R}$$
Your goal is to maximize
$$\mathbb{E}(\sum_{k=0}^\infty \gamma^k \cdot r_k(x_k, a_k))$$
where $\gamma in (0, 1)$ is a discounting factor which makes sure we don't get infinite rewards. $\gamma = 0.99$ is a typical choice.
Applications ¶
This very general problem description can be applied in almost any scenario:
- Learning automatically to play games
- Learning to control robots
Why RL is difficult ¶
- Credit assignment: In chess, you only get a reward (positiv or negative) at the end of the game. How to you tell which move was good or bad?
- Exploration vs. exploitation: When should you stick to what you know and when should you try something new?
- State equivalence: Typically, your state is very high-dimensional. For example when learning very old computer games from raw pixels you have $$210 \cdot 160 \cdot 3 = 100800$$ dimensions in your feature vector. But the relevant game states might be much less.
Approaches ¶
A policy network gets the state as input and outputs the action. It learns by executing many episodes (e.g. a complete game; from start until you reach a final state or at least a state with reward) and labels all actions before with the received reward. There might be many which were good even in a lost game, but in average you expect to punish bad decisions and encourage good decisions.
Resources ¶
If you are a student at KIT, I can recommend to visit the lecture Probabilistic Planning.
Other resources you might want to have a look at: