This is the first post in a brief series on algorithms (simple methods) that I like and that I think more people should know about: either because of their simplicity, their novelty, their usefulness, or all three. Each comes with a live demo. To begin, let’s introduce: k-Means Clustering .
The Idea
This sample shows k-Means in action. k-Means used to also be known as Lloyd’s method. It’s so simple that I’m surprised it’s not used more often.
The basic idea is: for some pile of info, whether its sales records for varying shoe sizes, the location of weeds in your yard, whatever, find a set of “best middles” — or “means” — for any given set of information. In the sample, we have a bunch of random, possibly-lumpy collections of little x’s on the canvas. The method tries to find some smaller number of categories (the “means,” marked by circles) that could best-represent that info.
The “k” in “k-Means” is the number of categories we want to create for dividing-up the samples.
The k-Means method doesn’t directly solve for “best.” It just keeps trying and adjusting, starting from completely random guesses as to where to put the means:
- Look at every point, and mark it “owned” by the closest “mean.”
- Move each mean to the middle of its “owned” points.
- Just keep doing those two steps, until either nothing changes each time or you run out of patience. If a proposed mean has no points, just discard that mean. You didn’t need it.
The Example Code
In this sample, we let the means move around until they’re stable, then randomly take one away, which forces all the rest to re-examine their situatons.
You can pause/start the animation by clicking on the canvas.
You can see the source here on GitHub. The entire example is contained in a single Javascript module that’s imaginatively named kmSample1
You can decide for yourself what these results look like: Popping soap bubbles? Growing algae mats in time-lapse? Voting districts? That dream about insects eating your car? I suggest turning on some Cinematic Orchestra for your headphones and stare at this page for a long while to decide.
In future examples, we’ll try using k-Means (and other tricks) in different ways. Some obvious questions might be:
- Are there better or different ways to decide which point belongs to which mean? What do we mean by “closeness”?
- What might this have to do with photography?