K-Means Clustering in Java

This post shows how to run k-means clustering algorithm in Java using Weka.

First, download weka.jar file here.

When it is unzipped, you have files like this:
weka

Add the weka.jar file to your project build path, and then take a look at the .arff file under data directory. By reading one or two of them, you should be able to see what kind of format weka take as input.

Second, prepare your data properly and use the following code to run k-means clustering algorithm. The output is the instance and their corresponding group.

package greenblocks.statistics;
 
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
 
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
 
public class Cluster {
 
	public static BufferedReader readDataFile(String filename) {
		BufferedReader inputReader = null;
 
		try {
			inputReader = new BufferedReader(new FileReader(filename));
		} catch (FileNotFoundException ex) {
			System.err.println("File not found: " + filename);
		}
 
		return inputReader;
	}
 
	public static void main(String[] args) throws Exception {
		SimpleKMeans kmeans = new SimpleKMeans();
 
		kmeans.setSeed(10);
 
		//important parameter to set: preserver order, number of cluster.
		kmeans.setPreserveInstancesOrder(true);
		kmeans.setNumClusters(5);
 
		BufferedReader datafile = readDataFile("C:/Users/ryan/workspace/data.arff"); 
		Instances data = new Instances(datafile);
 
 
		kmeans.buildClusterer(data);
 
		// This array returns the cluster number (starting with 0) for each instance
		// The array has as many elements as the number of instances
		int[] assignments = kmeans.getAssignments();
 
		int i=0;
		for(int clusterNum : assignments) {
		    System.out.printf("Instance %d -> Cluster %d \n", i, clusterNum);
		    i++;
		}
	}
}

Output:

Instance 0 -> Cluster 4 
Instance 1 -> Cluster 0 
Instance 2 -> Cluster 2 
Instance 3 -> Cluster 4 
Instance 4 -> Cluster 0 
Instance 5 -> Cluster 3 
Instance 6 -> Cluster 1 
Instance 7 -> Cluster 3 
Instance 8 -> Cluster 4 
...
Category >> Machine Learning  
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code> 
String foo = "bar";
</code></pre>