K-Means Clustering in Java

This post shows how to run k-means clustering algorithm in Java using Weka.

First, download weka.jar file here.

When it is unzipped, you have files like this:
weka

Add the weka.jar file to your project build path, and then take a look at the .arff file under data directory. By reading one or two of them, you should be able to see what kind of format weka take as input.

Second, prepare your data properly and use the following code to run k-means clustering algorithm. The output is the instance and their corresponding group.

package greenblocks.statistics;
 
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
 
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
 
public class Cluster {
 
	public static BufferedReader readDataFile(String filename) {
		BufferedReader inputReader = null;
 
		try {
			inputReader = new BufferedReader(new FileReader(filename));
		} catch (FileNotFoundException ex) {
			System.err.println("File not found: " + filename);
		}
 
		return inputReader;
	}
 
	public static void main(String[] args) throws Exception {
		SimpleKMeans kmeans = new SimpleKMeans();
 
		kmeans.setSeed(10);
 
		//important parameter to set: preserver order, number of cluster.
		kmeans.setPreserveInstancesOrder(true);
		kmeans.setNumClusters(5);
 
		BufferedReader datafile = readDataFile("C:/Users/ryan/workspace/data.arff"); 
		Instances data = new Instances(datafile);
 
 
		kmeans.buildClusterer(data);
 
		// This array returns the cluster number (starting with 0) for each instance
		// The array has as many elements as the number of instances
		int[] assignments = kmeans.getAssignments();
 
		int i=0;
		for(int clusterNum : assignments) {
		    System.out.printf("Instance %d -> Cluster %d \n", i, clusterNum);
		    i++;
		}
	}
}

Output:

Instance 0 -> Cluster 4 
Instance 1 -> Cluster 0 
Instance 2 -> Cluster 2 
Instance 3 -> Cluster 4 
Instance 4 -> Cluster 0 
Instance 5 -> Cluster 3 
Instance 6 -> Cluster 1 
Instance 7 -> Cluster 3 
Instance 8 -> Cluster 4 
...

5 thoughts on “K-Means Clustering in Java”

  1. Is there a kmeans java code in which we get the cluster of the testing data as output…rather than getting clusters of each instance of training data?

  2. The code does not work by adding just the weka.jar to the directory and compiling it. Getting error saying the clusters class should be in a file called clusters.java. Nowhere in the instructions does it say how to find this class file.

  3. je voudrais une code source de la méthode k-means pour développer mais compétence dans cette demain

Leave a Comment