Author: Wouter van Heeswijk, PhD
How pattern detection and pattern exploitation might elevate each other to a new level
15 min read -
Learn more reliable, robust, and transferable policies by adding entropy bonuses to your algorithm
10 min read -
Action spaces, particularly in combinatorial optimization problems, may grow unwieldy in size. This article discusses…
18 min read -
A gradient-based reinforcement learning algorithm to learn deterministic policies for continuous action spaces
12 min read -
A Python implementation of Q-learning to solve the Taxi-v3 environment from OpenAI Gym in an…
8 min read -
Why we let randomness dictate our action selection in Reinforcement Learning
7 min read -
Harness yourself against these shortcomings encountered in everyday RL algorithms
9 min read -
What happens if the most successful techniques in Deep Q-Learning are combined into a single…
14 min read -
The journey from REINFORCE to the go-to algorithm in continuous control
16 min read