The paper "The paramagnetic and glass transitions in Sudoku" by Alex Williams and Graeme J. Ackland claims that by defining an energy on filled Sudoku grids, you see phase transitions as the system moves to lower energy.
I'm trying to implement the ideas proposed in the paper here.
The authors wanted a simple model of a system which demonstrates the paramagnetic and glass transitions. The glass transition is between a solid, low-energy state and a liquid state with energy moving around. (I am not a physicist).
Sudoku's got that big grid energy
It's worth stating some facts that the authors assumed their readers would know:
- A Sudoku puzzle is a 9×9 grid divided into nine 3×3 subgrids.
- Each cell contains a number between 1 and 9.
- Some of the cells are fixed at the start - this defines the puzzle.
- The aim is to fill the rest of the grid with numbers so that they satisfy the constraints: every row, column and 3×3 subgrid must contain each of the numbers 1 to 9 once.
You can fill in a Sudoku grid at random and it probably won't satisfy the constraints.
They define a measure of energy on a filled-in Sudoku grid. For each cell, look at all the other cells in the same row, column or subgrid. Count up the number of cells that have the same value as the one you're looking at. The total number of cells with the same value is this cell's energy.
The energy for the whole grid is the sum of every cell's energy.
A solved Sudoku has zero energy: no two cells in the same column, row or subgrid have the same value.
Glass? Sudoku? Glass Sudoku?!
Now, the authors use some complicated-looking statistical mechanics that turns out to be quite easy to think about, from the right vantage point.
Prob(cell is in a state with energy H) ∝ exp(-H/T)
They say that the values of the grid cells follow a Boltzmann distribution: without changing the values of every other cell in the grid, the probability of one cell having a particular value is proportional to exp(-H/T), where H is the energy that cell would have, and T is a made-up parameter representing the "temperature" of the system.
A value which conflicts with no other cells in the same row, column or subgrid has energy H = 0, while a value which occurs repeatedly would have a very high H, and hence a lower probability of occurring.
It took me a while to understand that T is a number I get to make up, and not a function of the values in the grid like H is. The idea is that the higher (T) gets, the more likely higher energy states become. In the limit with an extremely high temperature, every state would be equally likely.
I now have in my mind an image of a bunsen burner under the Sudoku grid, making the cells dance around, unable to follow the rules. When the grid is hot, the values in the cells change frequently, like they're stepping on hot coals. As the grid cools down, each cell settles into the most comfortable value - the one that violates the fewest of the rules.
Now that we're talking about heating things up, we can look at the Sudoku grid as something like a quantity of matter, transitioning into different states as the temperature increases.
At low temperatures, nothing changes much, like a solid. At really high temperature, the values in the grid are effectively unpredictable, like the positions of particles in a gas.
In between, interesting things happen! The authors claim that you can see two phases in the system: a "paramagnetic" phase, where any assignment of numbers to the grid is equally likely, and a "glass" phase", which stays quite close to the ground state (the solution of the grid, with no rules violated and so zero energy) while occasionally moving to a slightly higher energy state.
I'm quite shaky on the physics terminology, and the paper is written for an audience that can be assumed to know the key concepts. Anyway, I think the gist is: this system doesn't just go linearly from solved at zero temperature to completely random at high temperature; in between, there are some interesting inflexion points where the system suddenly does something noticeably different.
Can't touch this? Call MCMC HammerHammer
If you're like me, you won't have believed a word of that previous section, without having seen it happen.
So we'd like to make a simulation of this model. Given a starting grid and a temperature, it should give us a filled-in grid following the probability model defined earlier.
But we've only talked about probabilities for individual cells so far. The extension to entire grids looks straightforward:
The energy of a grid is the sum of the energies of all of its cells.
Then the probability of seeing a grid with total energy H should again follow a Boltzmann distribution.
The problem is that for any energy level H, there are loads and loads of different grids, and working out what they are is what mathematicians like to call "too hard to bother trying".
Fortunately, there's a really clever method for approximating complicated distributions like this: Markov Chain Monte Carlo.
The "Monte Carlo" bit means you're approximating a full picture of the system by taking lots of random samples (Monte Carlo is full of casinos). The "Markov Chain" bit means you're moving between states one step at a time, based on their relative probabilities.
Here's the idea:
- Start with an arbitrary filled-in grid.
- Pick a cell at random, and compute its current energy, H1. Then pick a different value for the cell, and compute the corresponding energy, H2.
- Those energies have corresponding probabilities. Based on the ratio between those probabilities, decide whether to stick with what you've got or change the cell to the new value.
- Repeat steps 2 and 3 a lot of times.
You're taking a random walk through the space of possible grids. The algorithm tries to head for more likely states, but takes a whimsical detour every now and then. This stops it getting stuck in a local minimum, and ideally makes every state at least theoretically possible to reach.
This is the Metropolis-Hastings algorithm (just called the 'Metropolis algorithm' in the paper; not sure why). For reasons that I'm going to dismiss with a majestic handwave, if you do enough iterations you end up with a grid drawn from a distribution approximating the true distribution defined implicitly before.
The fab thing about this algorithm is that you can watch it happen. That's what I've done - this page shows the Sudoku grid as the Metropolis-Hastings algorithm wanders about the space of possible values. The colour of each cell corresponds to its energy; as you increase the temperature, you can see the energy in the cells increasing.