A quick memory refresh on understanding how K-means work in Unsupervised Learning:
Pick how many clusters we wish to find. K = 2 means “finding two clusters”. K = 3 means “finding three clusters”. etc.
Algorithm (assumd K=2 for simplicity sake).
- Assign two ramdom cluster center points: A and B.
- Link up the A and B with a straight line. Pick the mid-point. Draw a perpendicular line. This is the dividing line between cluster A and cluster B.
- Now our goal is to find a revised A and B locations. Do this by picking an A location that has the minimum straight-line distances between all the points in cluster A. Do the same for B. i.e. Now we have the new A and B locations.
- Repeat step 2 and step 3 iteratively. Until the A and B locations are more or less stable. i.e. we have two defined clusters.
This short video by Udacity summarizes this very nicely: