Java Code Examples for org.apache.commons.math3.exception.util.LocalizedFormats#OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY

The following examples show how to use org.apache.commons.math3.exception.util.LocalizedFormats#OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 2
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 3
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if one entry in <code>observed1</code> or
 * <code>observed2</code> is not positive
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at the same index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    checkNonNegative(observed1);
    checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 4
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if one entry in <code>observed1</code> or
 * <code>observed2</code> is not positive
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at the same index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    checkNonNegative(observed1);
    checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 5
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 6
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 7
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if one entry in <code>observed1</code> or
 * <code>observed2</code> is not positive
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at the same index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    checkNonNegative(observed1);
    checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 8
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 9
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 10
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 11
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}
 
Example 12
Source File: GTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a G (Log-Likelihood Ratio) two sample test statistic for
 * independence comparing frequency counts in
 * {@code observed1} and {@code observed2}. The sums of frequency
 * counts in the two samples are not required to be the same. The formula
 * used to compute the test statistic is </p>
 *
 * <p>{@code 2 * totalSum * [H(rowSums) + H(colSums) - H(k)]}</p>
 *
 * <p> where {@code H} is the
 * <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">
 * Shannon Entropy</a> of the random variable formed by viewing the elements
 * of the argument array as incidence counts; <br/>
 * {@code k} is a matrix with rows {@code [observed1, observed2]}; <br/>
 * {@code rowSums, colSums} are the row/col sums of {@code k}; <br>
 * and {@code totalSum} is the overall sum of all entries in {@code k}.</p>
 *
 * <p>This statistic can be used to perform a G test evaluating the null
 * hypothesis that both observed counts are independent </p>
 *
 * <p> <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative. </li>
 * <li>Observed counts for a specific bin must not both be zero. </li>
 * <li>Observed counts for a specific sample must not all be  0. </li>
 * <li>The arrays {@code observed1} and {@code observed2} must have
 * the same length and their common length must be at least 2. </li></ul></p>
 *
 * <p>If any of the preconditions are not met, a
 * {@code MathIllegalArgumentException} is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data
 * set
 * @return G-Test statistic
 * @throws DimensionMismatchException the the lengths of the arrays do not
 * match or their common length is less than 2
 * @throws NotPositiveException if any entry in {@code observed1} or
 * {@code observed2} is negative
 * @throws ZeroException if either all counts of
 * {@code observed1} or {@code observed2} are zero, or if the count
 * at the same index is zero for both arrays.
 */
public double gDataSetsComparison(final long[] observed1, final long[] observed2)
        throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;

    // Compute and compare count sums
    final long[] collSums = new long[observed1.length];
    final long[][] k = new long[2][observed1.length];

    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            countSum1 += observed1[i];
            countSum2 += observed2[i];
            collSums[i] = observed1[i] + observed2[i];
            k[0][i] = observed1[i];
            k[1][i] = observed2[i];
        }
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    final long[] rowSums = {countSum1, countSum2};
    final double sum = (double) countSum1 + (double) countSum2;
    return 2 * sum * (entropy(rowSums) + entropy(collSums) - entropy(k));
}
 
Example 13
Source File: ChiSquareTest.java    From astor with GNU General Public License v2.0 4 votes vote down vote up
/**
 * <p>Computes a
 * <a href="http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm">
 * Chi-Square two sample test statistic</a> comparing bin frequency counts
 * in <code>observed1</code> and <code>observed2</code>.  The
 * sums of frequency counts in the two samples are not required to be the
 * same.  The formula used to compute the test statistic is</p>
 * <code>
 * &sum;[(K * observed1[i] - observed2[i]/K)<sup>2</sup> / (observed1[i] + observed2[i])]
 * </code> where
 * <br/><code>K = &sqrt;[&sum(observed2 / &sum;(observed1)]</code>
 * </p>
 * <p>This statistic can be used to perform a Chi-Square test evaluating the
 * null hypothesis that both observed counts follow the same distribution.</p>
 * <p>
 * <strong>Preconditions</strong>: <ul>
 * <li>Observed counts must be non-negative.
 * </li>
 * <li>Observed counts for a specific bin must not both be zero.
 * </li>
 * <li>Observed counts for a specific sample must not all be 0.
 * </li>
 * <li>The arrays <code>observed1</code> and <code>observed2</code> must have
 * the same length and their common length must be at least 2.
 * </li></ul></p><p>
 * If any of the preconditions are not met, an
 * <code>IllegalArgumentException</code> is thrown.</p>
 *
 * @param observed1 array of observed frequency counts of the first data set
 * @param observed2 array of observed frequency counts of the second data set
 * @return chiSquare test statistic
 * @throws DimensionMismatchException the the length of the arrays does not match
 * @throws NotPositiveException if any entries in <code>observed1</code> or
 * <code>observed2</code> are negative
 * @throws ZeroException if either all counts of <code>observed1</code> or
 * <code>observed2</code> are zero, or if the count at some index is zero
 * for both arrays
 * @since 1.2
 */
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
    throws DimensionMismatchException, NotPositiveException, ZeroException {

    // Make sure lengths are same
    if (observed1.length < 2) {
        throw new DimensionMismatchException(observed1.length, 2);
    }
    if (observed1.length != observed2.length) {
        throw new DimensionMismatchException(observed1.length, observed2.length);
    }

    // Ensure non-negative counts
    MathArrays.checkNonNegative(observed1);
    MathArrays.checkNonNegative(observed2);

    // Compute and compare count sums
    long countSum1 = 0;
    long countSum2 = 0;
    boolean unequalCounts = false;
    double weight = 0.0;
    for (int i = 0; i < observed1.length; i++) {
        countSum1 += observed1[i];
        countSum2 += observed2[i];
    }
    // Ensure neither sample is uniformly 0
    if (countSum1 == 0 || countSum2 == 0) {
        throw new ZeroException();
    }
    // Compare and compute weight only if different
    unequalCounts = countSum1 != countSum2;
    if (unequalCounts) {
        weight = FastMath.sqrt((double) countSum1 / (double) countSum2);
    }
    // Compute ChiSquare statistic
    double sumSq = 0.0d;
    double dev = 0.0d;
    double obs1 = 0.0d;
    double obs2 = 0.0d;
    for (int i = 0; i < observed1.length; i++) {
        if (observed1[i] == 0 && observed2[i] == 0) {
            throw new ZeroException(LocalizedFormats.OBSERVED_COUNTS_BOTTH_ZERO_FOR_ENTRY, i);
        } else {
            obs1 = observed1[i];
            obs2 = observed2[i];
            if (unequalCounts) { // apply weights
                dev = obs1/weight - obs2 * weight;
            } else {
                dev = obs1 - obs2;
            }
            sumSq += (dev * dev) / (obs1 + obs2);
        }
    }
    return sumSq;
}