Logistic Regression vs. SVM

Logistic Regression and SVM often give the similar results. SVM costs longer to train than logistic regression, so it seems that there is no obvious reason to use SVM. Actually, in industry logistic regression is the most frequently used algorithm.

The reason that logistic regression and SVM have similar performance is that the training data is linearly separable, which happens very often. Therefore, there is no need to project the value to a higher dimension to separate them.

Read more

K-Means Clustering in Java

This post shows how to run k-means clustering algorithm in Java using Weka. First, download weka.jar file here. When it is unzipped, you have files like this: Add the weka.jar file to your project build path, and then take a look at the .arff file under data directory. By reading one or two of them, … Read more

Machine Learning Resources

Here are some good machine learning resources. The Unreasonable Effectiveness of Recurrent Neural Networks: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Stanford deep learning for NLP (lecture notes): https://cs224d.stanford.edu/lecture_notes/ NTU deep learning (lecture notes): http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLDS17.html LSTM Hello World: https://medium.com/towards-data-science/lstm-by-example-using-tensorflow-feb0c1968537 HBO’s Silicon Valley “Not Hotdog”: https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3 Use the TimeDistributed Layer for LSTM: https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ Deep learning intro: https://medium.com/machine-learning-for-humans/neural-networks-deep-learning-cdad8aeae49b

How to handle noise?

If you apply for a data job, this is a commonly asked interview question. It can show your experience and understanding of machine learning. First of all, noise can occur in both input (X) and output (Y). Missing Values in X 1. Use the features’s mean value from all the available data 2. Ignore the … Read more

A Simple Machine Learning Example in Java

This is a “Hello World” example of machine learning in Java. It simply give you a taste of machine learning in Java. Environment Java 1.6+ and Eclipse Step 1: Download Weka library Download page: http://www.cs.waikato.ac.nz/ml/weka/snapshots/weka_snapshots.html Download stable.XX.zip, unzip the file, add weka.jar to your library path of Java project in Eclipse. Step 2: Prepare Data … Read more