All posts by Johnny
How to share Vagrant VM “/var/www” directory with Windows
Hackathon Notes – MODHack – Mining the Deep Web
Some good websites for SQL Practices
GitHub Version Release Guides
One Year Old… 10,000 Visitors!
Udacity – Intro to Artificial Intelligence – Markov Chain Problem 1 – Weather Forecast
Udacity – Intro to Artificial Intelligence – Supervised vs Unsupervised Learning
This video by Udacity summarizes Supervised vs Unsupervised Learning very nicely.
In a nutshell (using clustering as an example for instance):
- Supervised Learning has label. e.g. Spam email filtering. We have 100 emails filled with words. We label each email SPAM (Junk email) or HAM (email that worth reading) up-front. We apply algorithms (such as Naive Bayes) to build a model that will tell us, given a new email, how probable that the email is a SPAM.
- Unsupervised Learning has NO label. The algorithm will try and label it for you. (e.g. divid the 100 emials into two cluster. The SPAM cluster may share some common characteristics. The HAM cluster may have some other characteristics. (e.g. K-means Clustering, Expectation Maximization Clustering, Spectral Clustering).
Udacity – Intro to Artificial Intelligence – Unsupervised Learning – Spectral Clustering
Spectral clustering focuses on affinity (how relatively close the neighbouring points are from each other), rather than absolute position of the data points (i.e. what K-means and Expectation Maximization clustering algorithms focus on).
This Udacity video summarizes Spectral clustering very nicely.
Udacity – Intro to Artificial Intelligence – Unsupervised Learning – Expectation Maximization Clustering
The Expectation Maximization is somewhat similar to K-means, with this core difference:
In the corresponding step:
- k-means uses “hard correspondence” – estimated centerpoint A only compares with the data points in cluster A in the revision of new estimated centerpoint A location. It does not compare with data points from other clusters (e.g. cluster B, etc.)
- Expectation Maximization uses “soft correspondence” – estimated centerpoint A compares with the data points in cluster A and other clusters in the revision of new estimated centerpoint A location. It does compare with data points from other clusters (e.g. cluster B, etc.).
This video by Udacity summarizes this very nicely.