This is the first post in a brief series on simple algorithms I like, either because of their simplicty, their novelty, their usefulness, or all three. Each comes with a live demo. To begin, I introduce a method that more programmers and artists should know about: k-Means Clustering .
This sample shows k-Means in action. k-Means used to also be known as Lloyd’s method. It’s so simple that I’m surprised it’s not used more often.
The basic idea is: for some pile of info, whether its sales records for varying shoe sizes, the location of weeds in your yard, whatever, find a set of “best middles” — or “means” — for any given set of information. In the sample, we have a bunch of random, possibly-lumpy collections of little x’s on the canvas. The method tries to find some smaller number of categories (the “means,” marked by circles) that could best-represent that info.
The “k” in “k-Means” is the number of categories we want to create for dividing-up the samples.
The k-Means method doesn’t directly solve for “best.” It just keeps trying and adjusting, starting from completely random guesses as to where to put the means:
- Look at every point, and mark it “owned” by the closest “mean.”
- Move each mean to the middle of its “owned” points.
- Just keep doing those two steps, until either nothing changes each time or you run out of patience. If a proposed mean has no points, just discard that mean. You didn’t need it.
In this sample, we let the means move around until they’re stable, then randomly take one away, which forces all the rest to re-examine their situatons.
You can pause/start the animation by clicking on the canvas.
You can decide for yourself what these results look like: Popping soap bubbles? Growing algae mats in time-lapse? Voting districts? That dream about insects eating your car? I suggest turning on some Cinematic Orchestra for your headphones and stare at this page for a long while to decide.
In the next example, we’ll try using k-Means in different ways. Some obvious questions might be:
- Are there better or different ways to decide which point belongs to which mean? What do we mean by “closeness”?
- What does this have to do with photography?