Java Code Examples for htsjdk.samtools.util.SequenceUtil#basesEqual()

The following examples show how to use htsjdk.samtools.util.SequenceUtil#basesEqual() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: TrimSequenceTemplate.java    From Drop-seq with MIT License 6 votes vote down vote up
/**
 * Test to see if this read matches this barcode. If any base of a barcode
 * starts with N or n, then ignore that position.
 *
 * @param testString
 *            The read to look for this barcode in. The barcode should be at
 *            the start of the read for this method.  The entire barcode is expected for a match.
 * @return true if this barcode is found in the read.
 */
public boolean hasForwardMatch(final String testString) {
	byte[] testBases = StringUtil.stringToBytes(testString);
	int numBasesCanMatch = 0;
	int numBasesMatch = 0;
	for (int i = 0; i < bases.length; i++) {
		if (isIgnoreBase(this.bases[i]))
			continue;
		numBasesCanMatch++;
		if (SequenceUtil.basesEqual(testBases[i], bases[i]))
			numBasesMatch++;
	}
	if (numBasesCanMatch == numBasesMatch)
		return (true);
	return false;
}
 
Example 2
Source File: GcBiasUtils.java    From picard with MIT License 6 votes vote down vote up
public static int calculateGc(final byte[] bases, final int startIndex, final int endIndex, final CalculateGcState state) {
    if (state.init) {
        state.init = false;
        state.gcCount = 0;
        state.nCount = 0;
        for (int i = startIndex; i < endIndex; ++i) {
            final byte base = bases[i];
            if (SequenceUtil.basesEqual(base, (byte)'G') || SequenceUtil.basesEqual(base, (byte)'C')) ++state.gcCount;
            else if (SequenceUtil.basesEqual(base, (byte)'N')) ++state.nCount;
        }
    } else {
        final byte newBase = bases[endIndex - 1];
        if (SequenceUtil.basesEqual(newBase, (byte)'G') || SequenceUtil.basesEqual(newBase, (byte)'C')) ++state.gcCount;
        else if (newBase == 'N') ++state.nCount;

        if (SequenceUtil.basesEqual(state.priorBase, (byte)'G') || SequenceUtil.basesEqual(state.priorBase, (byte)'C')) --state.gcCount;
        else if (SequenceUtil.basesEqual(state.priorBase, (byte)'N')) --state.nCount;
    }
    state.priorBase = bases[startIndex];
    if (state.nCount > 4) return -1;
    else return (state.gcCount * 100) / (endIndex - startIndex);
}
 
Example 3
Source File: AdapterUtility.java    From picard with MIT License 6 votes vote down vote up
/**
 * Checks the first ADAPTER_MATCH_LENGTH bases of the read against known adapter sequences and returns
 * true if the read matches an adapter sequence with MAX_ADAPTER_ERRORS mismsatches or fewer.
 *
 * @param read the basecalls for the read in the order and orientation the machine read them
 * @param  revCompRead When aligned reads are checked for being adapter, this specified if the original
 *                     read had ben rev-comped during alignment.
 * @return true if the read matches an adapter and false otherwise
 */
public boolean isAdapterSequence(final byte[] read, boolean revCompRead) {
    if (read.length < ADAPTER_MATCH_LENGTH) return false;

    for (final byte[] adapter : adapterKmers) {
        int errors = 0;

        for (int i = 0; i < adapter.length; ++i) {
            final byte base = revCompRead ? SequenceUtil.complement(read[read.length - i - 1]) : read[i];
            if (!SequenceUtil.basesEqual(base, adapter[i]) && ++errors > MAX_ADAPTER_ERRORS) {
                break;
            }
        }

        if (errors <= MAX_ADAPTER_ERRORS) {
            return true;
        }
    }

    return false;
}
 
Example 4
Source File: SingleBarcodeDistanceMetric.java    From picard with MIT License 6 votes vote down vote up
public int hammingDistance() {
    int numMismatches = 0;
    for (int i = 0; i < barcodeBases.length && i < readBases.length && numMismatches <= maximalInterestingDistance; ++i) {

        if (SequenceUtil.isNoCall(readBases[i])) {
            continue;
        }

        if (!SequenceUtil.basesEqual(barcodeBases[i], readBases[i])) {
            ++numMismatches;
            continue;
        }

        // bases are equal but if quality is low we still penalize.
        // TODO: is it actually useful to penalize barcodes in this way?
        if (readQualities != null && readQualities[i] < minimumBaseQuality) {
            ++numMismatches;
        }
    }
    return numMismatches;
}
 
Example 5
Source File: TrimSequenceTemplate.java    From Drop-seq with MIT License 5 votes vote down vote up
/**
 * Does the testString have part (or all) of the template in it?
 * The test string starting at position 1 may have some portion of the template.
 * For example, ANY bases of the template could have the 8 bases in the middle of a 30 base testString.
 * This is more flexible and slow than getPositionInTemplate, as it will let any portion of the template occur anywhere in the read.
 * @param testString A string to test against the template
 * @param minMatch The number of bases that must match in both
 * @param mismatchesAllowed How many mismatches can there be between the template and the read
 * @return The position in the read, 0 based.
 */
public int getPositionInRead (final String testString, final int minMatch, final int mismatchesAllowed) {
	byte [] read = StringUtil.stringToBytes(testString);
	// If the read's too short we can't possibly match it
       if (read == null || read.length < minMatch) return -1;

       int maxNumMatches=0;
       int bestReadStartPos=0;

       int lastViableReadIndex=read.length-minMatch+1;
       // Walk forwards
       READ_LOOP:  // walks through the read, looking for the template.
       for (int readIndex = 0; readIndex<lastViableReadIndex; readIndex++) {
           int mismatches = 0;
           int numMatches= 0;

           // can only search as far as you have left in the read.
           final int searchLength = Math.min(this.bases.length, read.length-readIndex);
           if (searchLength<minMatch) break;  // if you don't have enough search space left to match a min number of bases you give up.
           for (int templateIndex = 0; templateIndex < searchLength; templateIndex++) {
           	int tempReadIndex=templateIndex+readIndex;
           	char templateBase = (char)this.bases[templateIndex];
           	char readBase = (char) read[tempReadIndex];
               if (SequenceUtil.isNoCall(read[tempReadIndex]) || !SequenceUtil.basesEqual(this.bases[templateIndex], read[tempReadIndex])) {
               	if (++mismatches > mismatchesAllowed) continue READ_LOOP;
               } else
				numMatches++;
               if (numMatches>maxNumMatches) {
               	maxNumMatches=numMatches;
               	bestReadStartPos=readIndex;
               }
           }

       }
       if (maxNumMatches<minMatch) return (-1);
       return bestReadStartPos;
}
 
Example 6
Source File: SingleBarcodeDistanceMetric.java    From picard with MIT License 5 votes vote down vote up
/**
     * Similar to Hamming distance but this version doesn't penalize matching bases with low quality for the read.

     * @return the edit distance between the barcode(s) and the read(s)
     */
public int lenientHammingDistance() {
    int numMismatches = 0;
    for (int i = 0; i < barcodeBases.length && i < maskedBases.length && numMismatches <= maximalInterestingDistance; ++i) {

        if (SequenceUtil.isNoCall(maskedBases[i])) {
            continue;
        }

        if (!SequenceUtil.basesEqual(barcodeBases[i], maskedBases[i])) {
            ++numMismatches;
        }
    }
    return numMismatches;
}
 
Example 7
Source File: TrimSequenceTemplate.java    From Drop-seq with MIT License 4 votes vote down vote up
public static boolean baseInBaseList(final Byte base, final byte[] baseList) {
	for (Byte b : baseList)
		if (SequenceUtil.basesEqual(b, base))
			return true;
	return false;
}
 
Example 8
Source File: OverlappingReadsErrorCalculator.java    From picard with MIT License 4 votes vote down vote up
/**
 * The function by which new loci are "shown" to the calculator
 **/
@Override
public void addBase(final SamLocusIterator.RecordAndOffset recordAndOffset, final SamLocusAndReferenceIterator.SAMLocusAndReference locusAndRef) {
    super.addBase(recordAndOffset, locusAndRef);
    final byte readBase = recordAndOffset.getReadBase();
    final SAMRecord record = recordAndOffset.getRecord();

    // by traversing the reads and splitting into sets with the same name we convert a O(N^2) iteration
    // into a O(N) iteration
    updateReadNameSet(locusAndRef.getLocus());

    final SamLocusIterator.RecordAndOffset mate = readNameSets.get(record.getReadName())
            .stream()
            .filter(putative -> areReadsMates(record, putative.getRecord()))
            .findFirst()
            .orElse(null);

    // we are only interested in bases for which the mate read also has a base over the same locus
    if (mate == null) return;
    // both bases need to be called for this error calculation

    final byte mateBase = mate.getReadBase();
    if (SequenceUtil.isNoCall(readBase)) return;
    if (SequenceUtil.isNoCall(mateBase)) return;

    nTotalBasesWithOverlappingReads++;

    // Only bases that disagree with the reference are counted as errors.
    if (SequenceUtil.basesEqual(readBase, locusAndRef.getReferenceBase())) return;

    final boolean agreesWithMate = SequenceUtil.basesEqual(readBase, mateBase);
    final boolean mateAgreesWithRef = SequenceUtil.basesEqual(mateBase, locusAndRef.getReferenceBase());

    if (agreesWithMate) {
        nBothDisagreeWithReference++;
    } else if (mateAgreesWithRef) {
        nDisagreeWithRefAndMate++;
    } else {
        nThreeWaysDisagreement++;
    }
}
 
Example 9
Source File: RrbsMetricsCollector.java    From picard with MIT License 4 votes vote down vote up
public void acceptRecord(final SAMRecordAndReference args) {
	mappedRecordCount++;

	final SAMRecord samRecord = args.getSamRecord();
	final ReferenceSequence referenceSequence = args.getReferenceSequence();

	final byte[] readBases = samRecord.getReadBases();
	final byte[] readQualities = samRecord.getBaseQualities();
	final byte[] refBases = referenceSequence.getBases();

	if (samRecord.getReadLength() < minReadLength) {
		smallReadCount++;
		return;
	} else if (SequenceUtil.countMismatches(samRecord, refBases, true) > Math.round(samRecord.getReadLength() * maxMismatchRate)) {
		mismatchCount++;
		return;
	}

	// We only record non-CpG C sites if there was at least one CpG in the read, keep track of
	// the values for this record and then apply to the global totals if valid
	int recordCpgs = 0;

	for (final AlignmentBlock alignmentBlock : samRecord.getAlignmentBlocks()) {
		final int blockLength = alignmentBlock.getLength();
		final int refFragmentStart = alignmentBlock.getReferenceStart() - 1;
		final int readFragmentStart = alignmentBlock.getReadStart() - 1;

		final byte[] refFragment = getFragment(refBases, refFragmentStart, blockLength);
		final byte[] readFragment = getFragment(readBases, readFragmentStart, blockLength);
		final byte[] readQualityFragment = getFragment(readQualities, readFragmentStart, blockLength);

		if (samRecord.getReadNegativeStrandFlag()) {
			// In the case of a negative strand, reverse (and complement for base arrays) the reference,
			// reads & qualities so that it can be treated as a positive strand for the rest of the process
			SequenceUtil.reverseComplement(refFragment);
			SequenceUtil.reverseComplement(readFragment);
			SequenceUtil.reverseQualities(readQualityFragment);
		}

		for (int i=0; i < blockLength-1; i++) {
			final int curRefIndex = getCurRefIndex(refFragmentStart, blockLength, i, samRecord.getReadNegativeStrandFlag());

			// Look at a 2-base window to see if we're on a CpG site, and if so check for conversion
			// (CG -> TG). We do not consider ourselves to be on a CpG site if we're on the last base of a read
			if ((SequenceUtil.basesEqual(refFragment[i], SequenceUtil.C)) &&
					(SequenceUtil.basesEqual(refFragment[i+1], SequenceUtil.G))) {
				// We want to catch the case where there's a CpG in the reference, even if it is not valid
				// to prevent the C showing up as a non-CpG C down below. Otherwise this could have been all
				// in one if statement
				if (isValidCpg(refFragment, readFragment, readQualityFragment, i)) {
					recordCpgs++;
					final CpgLocation curLocation = new CpgLocation(samRecord.getReferenceName(), curRefIndex);
					cpgTotal.increment(curLocation);
					if (SequenceUtil.isBisulfiteConverted(readFragment[i], refFragment[i])) {
						cpgConverted.increment(curLocation);
					}
				}
				i++;
			} else if (isC(refFragment[i], readFragment[i]) && isAboveCytoQcThreshold(readQualities, i) &&
					SequenceUtil.bisulfiteBasesEqual(false, readFragment[i+1], refFragment[i+1])) {
				// C base in the reference that's not associated with a CpG
				nCytoTotal++;
				if (SequenceUtil.isBisulfiteConverted(readFragment[i], refFragment[i])) {
					nCytoConverted++;
				}
			}
		}
	}

	if (recordCpgs == 0) {
		noCpgCount++;
	}
}
 
Example 10
Source File: RrbsMetricsCollector.java    From picard with MIT License 4 votes vote down vote up
/**
 * True if there's a C in the reference as well as read (possibly bisulfite converted)
 */
private boolean isC(final byte refBase, final byte readBase) {
	return (SequenceUtil.basesEqual(refBase, SequenceUtil.C) && SequenceUtil.bisulfiteBasesEqual(readBase, refBase));
}
 
Example 11
Source File: RrbsMetricsCollector.java    From picard with MIT License 4 votes vote down vote up
/**
 * Checks a pair of bases for CpG status & quality thresholds
 */
private boolean isValidCpg(final byte[] refBases, final byte[] readBases, final byte[] readQualities, final int index) {
	return isC(refBases[index], readBases[index]) && SequenceUtil.basesEqual(refBases[index+1], readBases[index+1]) &&
			isAboveCytoQcThreshold(readQualities, index);
}