Fresh out of Introduction to Artificial Intelligence and interested in practicing my web development skills, I incorporated both into QWOP-AI, my first personal project of 2013. Inspired by QWOP, I created a walking game where the player must walk by controlling 4 ragdoll properties: thigh angle and knee angle for both left and right knees. These behaviors are controlled using 4 keys: Q, W, O and P. You can try the game out below, as well as the AI. Press Q to move the ragdoll's legs closer together and W to move them apart. O bends the left knee and straightens the right knee, and P bends the right knee while straightening the left. You can also press SPACE to reset the ragdoll.
Originally, I implemented the naive version of Q-learning I learned in my AI class. Using various metrics to describe the state of the ragdoll, I set out to learn the optimal action given every possible state. I rewarded forward motion, penalized falling over, and ran the learner for around 8 hours at 10 states per second. Unfortunately, this didn't work as well as I had planned — my first attempt ended up converging on a (rather entertaining) local optimum, and was unable to actually learn to take steps.
To avoid this local optimum, I not only rewarded forward motion, but also gave a small amount of points for simply moving the ragdoll's legs back and forth. To help the algorithm converge faster, I also pushed knowledge of falling to earlier states by heavily penalizing unstable configurations (e.g. foot height being too high). While this improved performance, the learner still did not quite capture walking behavior, instead converging on a relatively stable but slow sequence of actions that loosely resembled skipping.
Fearing that my 6 original features were insufficient to distinguish walking from non-walking states, I reengineered my features in a way that I thought would allow the learner to better capture walking behavior. While my old features were all focused on capturing hip- and knee-joint angles, this new set of features extends to capture other details about the ragdoll, such as torso angle and head height. While this strategy didn't yield a better result on its own, it would prove critical to later efforts. At this point, I put QWOP on the shelf as I dedicated the remainder of my time to working on my research project at ICSI.
As it turns out, the hours spent debugging and testing my version of QWOP made me quite proficient at walking — at least, good enough to reach 20 meters or so before falling. Instead of trying to learn walking behavior from no data, I decided to bootstrap my AI with data taken from my own attempts at walking before letting it learn on its own. By doing this, the AI would begin with some vague notion of walking and improve with time. After learning for around 10 hours by itself, the AI finally took multiple steps and was able to travel up to 20 meters in some cases. Unfortunately, the AI was still very far from perfect. While learned state-action mappings were accurate for the most part, the AI often stumbled across states that it had never seen before, but were similar to other states it already knew about in the Q-table. Thus, the AI needed a way of extrapolating its learned states to unlearned or otherwise unfamiliar states.
I achieved the most success in predicting actions from unknown states by using a neural network with 1 hidden layer. This neural network takes 16 inputs representing the state, and outputs 4 numbers, each bounded between 0 and 1, indicating whether or not each key (Q, W, O and P) should be pressed. I incorporated this neural network as a fallback to the primary Q-learning algorithm. In most cases, the ragdoll's states can be found in the lookup table generated by the Q-learning algorithm, so the neural network is not needed. However, the ragdoll encounters a state every so often that isn't contained in the lookup table. Instead of trying a random action as I'd previously done, I use the neural network to predict an appropriate action given the state.