Saturday, March 7, 2015

K-means clustering with RCaller - A library for calling R from Java

Here is an example of RCaller, a library for calling R from Java.

In the code below, we create two variables x and y. K-means clustering function kmeans is applied on the data matrix that consists of x and y. The result is then reported in Java.






package kmeansrcaller;

import rcaller.RCaller;
import rcaller.RCode;

public class KMeansRCaller {

    public static void main(String[] args) {
        RCaller caller = new RCaller();
        RCode code = new RCode();

        double[] x = new double[]{1, 2, 3, 4, 5, 10, 20, 30, 40, 50};
        double[] y = new double[]{2, 4, 6, 8, 10, 20, 40, 60, 80, 100};

        code.addDoubleArray("x", x);
        code.addDoubleArray("y", y);

        code.addRCode("result <- kmeans(cbind(x,y), 2)");

        caller.setRCode(code);

        caller.setRscriptExecutable("/usr/bin/Rscript");

        caller.runAndReturnResult("result");
        System.out.println(caller.getParser().getNames());

        int[] clusters = caller.getParser().getAsIntArray("cluster");
        double[][] centers = caller.getParser().getAsDoubleMatrix("centers");
        double[] totalSumOfSquares = caller.getParser().getAsDoubleArray("totss");
        // RCaller automatically replaces dots with underlines in variable names
        // So the parameter tot.withinss is accessible as tot_withinss
        double[] totalWithinSumOfSquares = caller.getParser().getAsDoubleArray("tot_withinss");
        double[] totalBetweenSumOfSquares = caller.getParser().getAsDoubleArray("betweenss");

        for (int i = 0; i < clusters.length; i++) {
            System.out.println("Observation " + i + " is in cluster " + clusters[i]);
        }

        System.out.println("Cluster Centers:");
        for (int i = 0; i < centers.length; i++) {
            for (int j = 0; j < centers[0].length; j++) {
                System.out.print(centers[i][j] + " ");
            }
            System.out.println();
        }

        System.out.println("Total Within Sum of Squares: " + totalWithinSumOfSquares[0]);
        System.out.println("Total Between Sum of Squares: " + totalBetweenSumOfSquares[0]);
        System.out.println("Total Sum of Squares: " + totalSumOfSquares[0]);
    }

}



The output is



[cluster, centers, totss, withinss, tot_withinss, betweenss, size, iter, ifault]
Observation 0 is in cluster 2
Observation 1 is in cluster 2
Observation 2 is in cluster 2
Observation 3 is in cluster 2
Observation 4 is in cluster 2
Observation 5 is in cluster 2
Observation 6 is in cluster 2
Observation 7 is in cluster 1
Observation 8 is in cluster 1
Observation 9 is in cluster 1
Cluster Centers:
40.0 6.42857142857143 
80.0 12.8571428571429 
Total Within Sum of Squares: 2328.57142857143
Total Between Sum of Squares: 11833.9285714286
Total Sum of Squares: 14162.5



Have a nice read!






No comments:

Post a Comment

Thanks