Java Code Examples for org.apache.commons.math3.util.MathArrays#checkNonNegative()

The following examples show how to use org.apache.commons.math3.util.MathArrays#checkNonNegative() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
/**
 * Checks to make sure that the input long[][] array is rectangular,
 * has at least 2 rows and 2 columns, and has all non-negative entries.
 *
 * @param in input 2-way table to check
 * @throws NullArgumentException if the array is null
 * @throws DimensionMismatchException if the array is not valid
 * @throws NotPositiveException if the array contains any negative entries
 */
private void checkArray(final long[][] in)
    throws NullArgumentException, DimensionMismatchException,
    NotPositiveException {

    if (in.length < 2) {
        throw new DimensionMismatchException(in.length, 2);
    }

    if (in[0].length < 2) {
        throw new DimensionMismatchException(in[0].length, 2);
    }

    MathArrays.checkRectangular(in);
    MathArrays.checkNonNegative(in);

}
 
Example 2
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
/**
 * Checks to make sure that the input long[][] array is rectangular,
 * has at least 2 rows and 2 columns, and has all non-negative entries.
 *
 * @param in input 2-way table to check
 * @throws NullArgumentException if the array is null
 * @throws DimensionMismatchException if the array is not valid
 * @throws NotPositiveException if the array contains any negative entries
 */
private void checkArray(final long[][] in)
    throws NullArgumentException, DimensionMismatchException,
    NotPositiveException {

    if (in.length < 2) {
        throw new DimensionMismatchException(in.length, 2);
    }

    if (in[0].length < 2) {
        throw new DimensionMismatchException(in[0].length, 2);
    }

    MathArrays.checkRectangular(in);
    MathArrays.checkNonNegative(in);

}
 
Example 3
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
/**
 * Checks to make sure that the input long[][] array is rectangular,
 * has at least 2 rows and 2 columns, and has all non-negative entries.
 *
 * @param in input 2-way table to check
 * @throws NullArgumentException if the array is null
 * @throws DimensionMismatchException if the array is not valid
 * @throws NotPositiveException if the array contains any negative entries
 */
private void checkArray(final long[][] in)
    throws NullArgumentException, DimensionMismatchException,
    NotPositiveException {

    if (in.length < 2) {
        throw new DimensionMismatchException(in.length, 2);
    }

    if (in[0].length < 2) {
        throw new DimensionMismatchException(in[0].length, 2);
    }

    MathArrays.checkRectangular(in);
    MathArrays.checkNonNegative(in);

}
 
Example 4
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
/**
 * Checks to make sure that the input long[][] array is rectangular,
 * has at least 2 rows and 2 columns, and has all non-negative entries.
 *
 * @param in input 2-way table to check
 * @throws NullArgumentException if the array is null
 * @throws DimensionMismatchException if the array is not valid
 * @throws NotPositiveException if the array contains any negative entries
 */
private void checkArray(final long[][] in)
    throws NullArgumentException, DimensionMismatchException,
    NotPositiveException {

    if (in.length < 2) {
        throw new DimensionMismatchException(in.length, 2);
    }

    if (in[0].length < 2) {
        throw new DimensionMismatchException(in[0].length, 2);
    }

    MathArrays.checkRectangular(in);
    MathArrays.checkNonNegative(in);

}
 
Example 5
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
/**
 * Checks to make sure that the input long[][] array is rectangular,
 * has at least 2 rows and 2 columns, and has all non-negative entries.
 *
 * @param in input 2-way table to check
 * @throws NullArgumentException if the array is null
 * @throws DimensionMismatchException if the array is not valid
 * @throws NotPositiveException if the array contains any negative entries
 */
private void checkArray(final long[][] in)
    throws NullArgumentException, DimensionMismatchException,
    NotPositiveException {

    if (in.length < 2) {
        throw new DimensionMismatchException(in.length, 2);
    }

    if (in[0].length < 2) {
        throw new DimensionMismatchException(in[0].length, 2);
    }

    MathArrays.checkRectangular(in);
    MathArrays.checkNonNegative(in);

}
 
Example 6
Source File: GTest.java    From astor with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
 * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
 * frequency counts.
 *
 * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
 * Test) evaluating the null hypothesis that the observed counts follow the
 * expected distribution.</p>
 *
 * <p><strong>Preconditions</strong>: <ul>
 * <li>Expected counts must all be positive. </li>
 * <li>Observed counts must all be &ge; 0. </li>
 * <li>The observed and expected arrays must have the same length and their
 * common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * <p><strong>Note:</strong>This implementation rescales the
 * {@code expected} array if necessary to ensure that the sum of the
 * expected and observed counts are equal.</p>
 *
 * @param observed array of observed frequency counts
 * @param expected array of expected frequency counts
 * @return G-Test statistic
 * @throws NotPositiveException if {@code observed} has negative entries
 * @throws NotStrictlyPositiveException if {@code expected} has entries that
 * are not strictly positive
 * @throws DimensionMismatchException if the array lengths do not match or
 * are less than 2.
 */
public double g(final double[] expected, final long[] observed)
        throws NotPositiveException, NotStrictlyPositiveException,
        DimensionMismatchException {

    if (expected.length < 2) {
        throw new DimensionMismatchException(expected.length, 2);
    }
    if (expected.length != observed.length) {
        throw new DimensionMismatchException(expected.length, observed.length);
    }
    MathArrays.checkPositive(expected);
    MathArrays.checkNonNegative(observed);

    double sumExpected = 0d;
    double sumObserved = 0d;
    for (int i = 0; i < observed.length; i++) {
        sumExpected += expected[i];
        sumObserved += observed[i];
    }
    double ratio = 1d;
    boolean rescale = false;
    if (Math.abs(sumExpected - sumObserved) > 10E-6) {
        ratio = sumObserved / sumExpected;
        rescale = true;
    }
    double sum = 0d;
    for (int i = 0; i < observed.length; i++) {
        final double dev = rescale ?
                FastMath.log((double) observed[i] / (ratio * expected[i])) :
                    FastMath.log((double) observed[i] / expected[i]);
        sum += ((double) observed[i]) * dev;
    }
    return 2d * sum;
}
 
Example 7
Source File: GTest.java    From astor with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
 * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
 * frequency counts.
 *
 * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
 * Test) evaluating the null hypothesis that the observed counts follow the
 * expected distribution.</p>
 *
 * <p><strong>Preconditions</strong>: <ul>
 * <li>Expected counts must all be positive. </li>
 * <li>Observed counts must all be &ge; 0. </li>
 * <li>The observed and expected arrays must have the same length and their
 * common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * <p><strong>Note:</strong>This implementation rescales the
 * {@code expected} array if necessary to ensure that the sum of the
 * expected and observed counts are equal.</p>
 *
 * @param observed array of observed frequency counts
 * @param expected array of expected frequency counts
 * @return G-Test statistic
 * @throws NotPositiveException if {@code observed} has negative entries
 * @throws NotStrictlyPositiveException if {@code expected} has entries that
 * are not strictly positive
 * @throws DimensionMismatchException if the array lengths do not match or
 * are less than 2.
 */
public double g(final double[] expected, final long[] observed)
        throws NotPositiveException, NotStrictlyPositiveException,
        DimensionMismatchException {

    if (expected.length < 2) {
        throw new DimensionMismatchException(expected.length, 2);
    }
    if (expected.length != observed.length) {
        throw new DimensionMismatchException(expected.length, observed.length);
    }
    MathArrays.checkPositive(expected);
    MathArrays.checkNonNegative(observed);

    double sumExpected = 0d;
    double sumObserved = 0d;
    for (int i = 0; i < observed.length; i++) {
        sumExpected += expected[i];
        sumObserved += observed[i];
    }
    double ratio = 1d;
    boolean rescale = false;
    if (FastMath.abs(sumExpected - sumObserved) > 10E-6) {
        ratio = sumObserved / sumExpected;
        rescale = true;
    }
    double sum = 0d;
    for (int i = 0; i < observed.length; i++) {
        final double dev = rescale ?
                FastMath.log((double) observed[i] / (ratio * expected[i])) :
                    FastMath.log((double) observed[i] / expected[i]);
        sum += ((double) observed[i]) * dev;
    }
    return 2d * sum;
}
 
Example 8
Source File: GTest.java    From astor with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
 * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
 * frequency counts.
 *
 * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
 * Test) evaluating the null hypothesis that the observed counts follow the
 * expected distribution.</p>
 *
 * <p><strong>Preconditions</strong>: <ul>
 * <li>Expected counts must all be positive. </li>
 * <li>Observed counts must all be &ge; 0. </li>
 * <li>The observed and expected arrays must have the same length and their
 * common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * <p><strong>Note:</strong>This implementation rescales the
 * {@code expected} array if necessary to ensure that the sum of the
 * expected and observed counts are equal.</p>
 *
 * @param observed array of observed frequency counts
 * @param expected array of expected frequency counts
 * @return G-Test statistic
 * @throws NotPositiveException if {@code observed} has negative entries
 * @throws NotStrictlyPositiveException if {@code expected} has entries that
 * are not strictly positive
 * @throws DimensionMismatchException if the array lengths do not match or
 * are less than 2.
 */
public double g(final double[] expected, final long[] observed)
        throws NotPositiveException, NotStrictlyPositiveException,
        DimensionMismatchException {

    if (expected.length < 2) {
        throw new DimensionMismatchException(expected.length, 2);
    }
    if (expected.length != observed.length) {
        throw new DimensionMismatchException(expected.length, observed.length);
    }
    MathArrays.checkPositive(expected);
    MathArrays.checkNonNegative(observed);

    double sumExpected = 0d;
    double sumObserved = 0d;
    for (int i = 0; i < observed.length; i++) {
        sumExpected += expected[i];
        sumObserved += observed[i];
    }
    double ratio = 1d;
    boolean rescale = false;
    if (Math.abs(sumExpected - sumObserved) > 10E-6) {
        ratio = sumObserved / sumExpected;
        rescale = true;
    }
    double sum = 0d;
    for (int i = 0; i < observed.length; i++) {
        final double dev = rescale ?
                FastMath.log((double) observed[i] / (ratio * expected[i])) :
                    FastMath.log((double) observed[i] / expected[i]);
        sum += ((double) observed[i]) * dev;
    }
    return 2d * sum;
}
 
Example 9
Source File: GTest.java    From astor with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
 * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
 * frequency counts.
 *
 * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
 * Test) evaluating the null hypothesis that the observed counts follow the
 * expected distribution.</p>
 *
 * <p><strong>Preconditions</strong>: <ul>
 * <li>Expected counts must all be positive. </li>
 * <li>Observed counts must all be &ge; 0. </li>
 * <li>The observed and expected arrays must have the same length and their
 * common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * <p><strong>Note:</strong>This implementation rescales the
 * {@code expected} array if necessary to ensure that the sum of the
 * expected and observed counts are equal.</p>
 *
 * @param observed array of observed frequency counts
 * @param expected array of expected frequency counts
 * @return G-Test statistic
 * @throws NotPositiveException if {@code observed} has negative entries
 * @throws NotStrictlyPositiveException if {@code expected} has entries that
 * are not strictly positive
 * @throws DimensionMismatchException if the array lengths do not match or
 * are less than 2.
 */
public double g(final double[] expected, final long[] observed)
        throws NotPositiveException, NotStrictlyPositiveException,
        DimensionMismatchException {

    if (expected.length < 2) {
        throw new DimensionMismatchException(expected.length, 2);
    }
    if (expected.length != observed.length) {
        throw new DimensionMismatchException(expected.length, observed.length);
    }
    MathArrays.checkPositive(expected);
    MathArrays.checkNonNegative(observed);

    double sumExpected = 0d;
    double sumObserved = 0d;
    for (int i = 0; i < observed.length; i++) {
        sumExpected += expected[i];
        sumObserved += observed[i];
    }
    double ratio = 1d;
    boolean rescale = false;
    if (FastMath.abs(sumExpected - sumObserved) > 10E-6) {
        ratio = sumObserved / sumExpected;
        rescale = true;
    }
    double sum = 0d;
    for (int i = 0; i < observed.length; i++) {
        final double dev = rescale ?
                FastMath.log((double) observed[i] / (ratio * expected[i])) :
                    FastMath.log((double) observed[i] / expected[i]);
        sum += ((double) observed[i]) * dev;
    }
    return 2d * sum;
}
 
Example 10
Source File: GTest.java    From astor with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
 * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
 * frequency counts.
 *
 * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
 * Test) evaluating the null hypothesis that the observed counts follow the
 * expected distribution.</p>
 *
 * <p><strong>Preconditions</strong>: <ul>
 * <li>Expected counts must all be positive. </li>
 * <li>Observed counts must all be &ge; 0. </li>
 * <li>The observed and expected arrays must have the same length and their
 * common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * <p><strong>Note:</strong>This implementation rescales the
 * {@code expected} array if necessary to ensure that the sum of the
 * expected and observed counts are equal.</p>
 *
 * @param observed array of observed frequency counts
 * @param expected array of expected frequency counts
 * @return G-Test statistic
 * @throws NotPositiveException if {@code observed} has negative entries
 * @throws NotStrictlyPositiveException if {@code expected} has entries that
 * are not strictly positive
 * @throws DimensionMismatchException if the array lengths do not match or
 * are less than 2.
 */
public double g(final double[] expected, final long[] observed)
        throws NotPositiveException, NotStrictlyPositiveException,
        DimensionMismatchException {

    if (expected.length < 2) {
        throw new DimensionMismatchException(expected.length, 2);
    }
    if (expected.length != observed.length) {
        throw new DimensionMismatchException(expected.length, observed.length);
    }
    MathArrays.checkPositive(expected);
    MathArrays.checkNonNegative(observed);

    double sumExpected = 0d;
    double sumObserved = 0d;
    for (int i = 0; i < observed.length; i++) {
        sumExpected += expected[i];
        sumObserved += observed[i];
    }
    double ratio = 1d;
    boolean rescale = false;
    if (Math.abs(sumExpected - sumObserved) > 10E-6) {
        ratio = sumObserved / sumExpected;
        rescale = true;
    }
    double sum = 0d;
    for (int i = 0; i < observed.length; i++) {
        final double dev = rescale ?
                FastMath.log((double) observed[i] / (ratio * expected[i])) :
                    FastMath.log((double) observed[i] / expected[i]);
        sum += ((double) observed[i]) * dev;
    }
    return 2d * sum;
}
 
Example 11
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 12
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 13
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 14
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 15
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 16
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 17
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 18
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 19
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 20
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}