I wrote a bot to solve 2048 so I would never have to play it again
21 July 2014, by Tom Eccles
Along with many others inside and outside Softwire, I became mildly obsessed with the game 2048 a few months ago. 2048 is probably best explained by playing it – you slide tiles around a grid using the arrow keys, joining tiles of the same value to form progressively bigger and better tiles, aim to get the 2048 tile (or more). You lose when the board becomes so clogged that nothing can move. Having wasted plenty of time on playing the game myself, I decided to break the habit by programming a bot to play it better than I could.
The picture below demonstrates the strategy I decided to base the bot on. We’re just a few moves away from 2048. You can see a “snake” of tiles of decreasing value starting in the top left and working its way around the board. Trying to build a snake like this one seems to be a pretty robust strategy – you maintain it by never pressing the down arrow, and avoiding the right arrow if it moves the top row.
Making a computer play it
To make a bot play this strategy, it first needs a way of scoring a position, so it knows what to aim for. Well, since we want a long snake of decreasing tiles, let’s make that valuable. The score of a position is calculated by starting in the top left, and adding up tiles in the “snake”, until there’s a tile missing or we hit a tile that’s bigger than the last. So, in the diagram above, the score is (1024 + 512 + 256 + 128 + 64 + 32 + 16 + 8 + 8 + 4 + 2 + 2), whatever that is. If we changed the 256 tile into another 1024, the score would just be 1024 + 512, because then the next tile is too big and we stop scoring any points.
So now we know what a good position looks like, we have to get to one. We do this by searching a tree of possible positions, starting from the current one. We break down positions into two types – ones where we are about to play a move, and ones where a tile is about to appear. To assess a position, we look at all the child positions that could result from it. When we play a move, we control the next position – so the score should be the maximum score of the positions we can get to. But when we a tile appears, it appears in a random place – so the score of a position where a tile is about to appear should be the average score over all the places that could happen. This is called an expectimax algorithm. To choose what move we play we score a tree of positions as deep as we can be bothered, and then pick the move that gets us into the position with the highest score.
How good is it?
If you want to read, run or improve the code, it’s available on Github.