This video by Udacity summarizes Supervised vs Unsupervised Learning very nicely.
In a nutshell (using clustering as an example for instance):
Supervised Learning has label. e.g. Spam email filtering. We have 100 emails filled with words. We label each email SPAM (Junk email) or HAM (email that worth reading) up-front. We apply algorithms (such as Naive Bayes) to build a model that will tell us, given a new email, how probable that the email is a SPAM.
Unsupervised Learning has NO label. The algorithm will try and label it for you. (e.g. divid the 100 emials into two cluster. The SPAM cluster may share some common characteristics. The HAM cluster may have some other characteristics. (e.g. K-means Clustering, Expectation Maximization Clustering, Spectral Clustering).
Spectral clustering focuses on affinity (how relatively close the neighbouring points are from each other), rather than absolute position of the data points (i.e. what K-means and Expectation Maximization clustering algorithms focus on).
This Udacity video summarizes Spectral clustering very nicely.
The Expectation Maximization is somewhat similar to K-means, with this core difference:
In the corresponding step:
k-means uses “hard correspondence” – estimated centerpoint A only compares with the data points in cluster A in the revision of new estimated centerpoint A location. It does not compare with data points from other clusters (e.g. cluster B, etc.)
Expectation Maximization uses “soft correspondence” – estimated centerpoint A compares with the data points in cluster A and other clusters in the revision of new estimated centerpoint A location. It does compare with data points from other clusters (e.g. cluster B, etc.).
This video by Udacity summarizes this very nicely.
A quick memory refresh on understanding how K-means work in Unsupervised Learning:
Pick how many clusters we wish to find. K = 2 means “finding two clusters”. K = 3 means “finding three clusters”. etc.
Algorithm (assumd K=2 for simplicity sake).
Assign two ramdom cluster center points: A and B.
Link up the A and B with a straight line. Pick the mid-point. Draw a perpendicular line. This is the dividing line between cluster A and cluster B.
Now our goal is to find a revised A and B locations. Do this by picking an A location that has the minimum straight-line distances between all the points in cluster A. Do the same for B. i.e. Now we have the new A and B locations.
Repeat step 2 and step 3 iteratively. Until the A and B locations are more or less stable. i.e. we have two defined clusters.
This short video by Udacity summarizes this very nicely: