Java Code Examples for htsjdk.samtools.Cigar#getReadLength()

The following examples show how to use htsjdk.samtools.Cigar#getReadLength() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: ContigAlignmentsModifierUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
private static Iterable<AlignmentInterval> willThrowOnInvalidCigar(final Cigar cigar, final int readStart) throws GATKException {
    final AlignmentInterval detailsDoesnotMatter = new AlignmentInterval(
            new SimpleInterval("1", 1, cigar.getReferenceLength()),
            readStart, readStart+cigar.getReadLength() - CigarUtils.countClippedBases(cigar, CigarOperator.SOFT_CLIP), cigar,
            true, 60, 0, 100, ContigAlignmentsModifier.AlnModType.NONE);
    return ContigAlignmentsModifier.splitGappedAlignment(detailsDoesnotMatter, 1, cigar.getReadLength() + CigarUtils.countClippedBases(cigar, CigarOperator.HARD_CLIP));
}
 
Example 2
Source File: ReadFilterLibrary.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@Override public boolean test (final GATKRead read) {
return read.isUnmapped() ||
        read.getLength() == Cigar.getReadLength(read.getCigarElements());}
 
Example 3
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Aligns reads the haplotype, and then projects this alignment of read -> hap onto the reference
 * via the alignment of haplotype (via its getCigar) method.
 *
 * @param originalRead the read we want to write aligned to the reference genome
 * @param haplotype the haplotype that the read should be aligned to, before aligning to the reference
 * @param referenceStart the start of the reference that haplotype is aligned to.  Provides global coordinate frame.
 * @param isInformative true if the read is differentially informative for one of the haplotypes
 *
 * @param aligner
 * @throws IllegalArgumentException if {@code originalRead} is {@code null} or {@code haplotype} is {@code null} or it
 *   does not have a Cigar or the {@code referenceStart} is invalid (less than 1).
 *
 * @return a GATKRead aligned to reference. Never {@code null}.
 */
public static GATKRead createReadAlignedToRef(final GATKRead originalRead,
                                              final Haplotype haplotype,
                                              final Haplotype refHaplotype,
                                              final int referenceStart,
                                              final boolean isInformative,
                                              final SmithWatermanAligner aligner) {
    Utils.nonNull(originalRead);
    Utils.nonNull(haplotype);
    Utils.nonNull(refHaplotype);
    Utils.nonNull(haplotype.getCigar());
    Utils.nonNull(aligner);
    if ( referenceStart < 1 ) { throw new IllegalArgumentException("reference start much be >= 1 but got " + referenceStart); }

    // compute the smith-waterman alignment of read -> haplotype
    final SmithWatermanAlignment readToHaplotypeSWAlignment = aligner.align(haplotype.getBases(), originalRead.getBases(), CigarUtils.ALIGNMENT_TO_BEST_HAPLOTYPE_SW_PARAMETERS, SWOverhangStrategy.SOFTCLIP);
    if ( readToHaplotypeSWAlignment.getAlignmentOffset() == -1 ) {
        // sw can fail (reasons not clear) so if it happens just don't realign the read
        return originalRead;
    }

    final Cigar swCigar = new CigarBuilder().addAll(readToHaplotypeSWAlignment.getCigar()).make();

    // since we're modifying the read we need to clone it
    final GATKRead read = originalRead.copy();

    // only informative reads are given the haplotype tag to enhance visualization
    if ( isInformative ) {
        read.setAttribute(HAPLOTYPE_TAG, haplotype.hashCode());
    }

    // compute here the read starts w.r.t. the reference from the SW result and the hap -> ref cigar
    final Cigar rightPaddedHaplotypeVsRefCigar = haplotype.getConsolidatedPaddedCigar(1000);

    // this computes the number of reference bases before the read starts, based on the haplotype vs ref cigar
    // This cigar corresponds exactly to the readToRefCigarRaw, below.  One might wonder whether readToRefCigarRaw and
    // readToRefCigarClean ever imply different starts, which could occur if if the former has a leading deletion.  However,
    // according to the logic of applyCigarToCigar, this can only happen if the read has a leading deletion wrt its best haplotype,
    // which our SW aligner won't do, or if the read starts on a haplotype base that is in a deletion wrt to reference, which is nonsensical
    // since a base that exists is not a deletion.  Thus, there is nothing to worry about, in contrast to below where we do check
    // whether left-alignment shifted the start position.
    final int readStartOnReferenceHaplotype = readStartOnReferenceHaplotype(rightPaddedHaplotypeVsRefCigar, readToHaplotypeSWAlignment.getAlignmentOffset());


    //final int readStartOnReference = referenceStart + haplotype.getAlignmentStartHapwrtRef() + readStartOnHaplotype;

    final int readStartOnReference = referenceStart + haplotype.getAlignmentStartHapwrtRef() + readStartOnReferenceHaplotype;

    // compute the read -> ref alignment by mapping read -> hap -> ref from the
    // SW of read -> hap mapped through the given by hap -> ref

    // this is the sub-cigar of the haplotype-to-ref alignment, with cigar elements before the read start removed.  Elements after the read end are kept.
    final Cigar haplotypeToRef = trimCigarByBases(rightPaddedHaplotypeVsRefCigar, readToHaplotypeSWAlignment.getAlignmentOffset(), rightPaddedHaplotypeVsRefCigar.getReadLength() - 1).getCigar();

    final Cigar readToRefCigar = applyCigarToCigar(swCigar, haplotypeToRef);
    final CigarBuilder.Result leftAlignedReadToRefCigarResult = leftAlignIndels(readToRefCigar, refHaplotype.getBases(), originalRead.getBases(), readStartOnReferenceHaplotype);
    final Cigar leftAlignedReadToRefCigar = leftAlignedReadToRefCigarResult.getCigar();
    // it's possible that left-alignment shifted a deletion to the beginning of a read and removed it, shifting the first aligned base to the right
    read.setPosition(read.getContig(), readStartOnReference + leftAlignedReadToRefCigarResult.getLeadingDeletionBasesRemoved());

    // the SW Cigar does not contain the hard clips of the original read
    final Cigar originalCigar = originalRead.getCigar();
    final CigarElement firstElement = originalCigar.getFirstCigarElement();
    final CigarElement lastElement = originalCigar.getLastCigarElement();
    final List<CigarElement> readToRefCigarElementsWithHardClips = new ArrayList<>();
    if (firstElement.getOperator() == CigarOperator.HARD_CLIP) {
        readToRefCigarElementsWithHardClips.add(firstElement);
    }
    readToRefCigarElementsWithHardClips.addAll(leftAlignedReadToRefCigar.getCigarElements());
    if (lastElement.getOperator() == CigarOperator.HARD_CLIP) {
        readToRefCigarElementsWithHardClips.add(lastElement);
    }

    read.setCigar(new Cigar(readToRefCigarElementsWithHardClips));

    if ( leftAlignedReadToRefCigar.getReadLength() != read.getLength() ) {
        throw new GATKException("Cigar " + leftAlignedReadToRefCigar + " with read length " + leftAlignedReadToRefCigar.getReadLength()
                + " != read length " + read.getLength() + " for read " + read.toString() + "\nhapToRef " + haplotypeToRef + " length " + haplotypeToRef.getReadLength() + "/" + haplotypeToRef.getReferenceLength()
                + "\nreadToHap " + swCigar + " length " + swCigar.getReadLength() + "/" + swCigar.getReferenceLength());
    }

    return read;
}
 
Example 4
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the byte[] from bases that cover the reference interval refStart -> refEnd given the
 * alignment of bases to the reference (basesToRefCigar) and the start offset of the bases on the reference
 *
 * refStart and refEnd are 0 based offsets that we want to obtain.  In the client code, if the reference
 * bases start at position X and you want Y -> Z, refStart should be Y - X and refEnd should be Z - X.
 *
 * If refStart or refEnd would start or end the new bases within a deletion, this function will return null
 *
 * If the cigar starts with an insertion, the inserted bases are considered as coming before the start position and
 * are therefore excluded from the result.  That is getBasesCoveringRefInterval(0, 3, "ACTTGT", 0, 2I4M) should yield "TTGT".
 *
 * @param bases
 * @param refStart
 * @param refEnd
 * @param basesStartOnRef where does the bases array start w.r.t. the reference start?  For example, bases[0] of
 *                        could be at refStart == 0 if basesStartOnRef == 0, but it could just as easily be at
 *                        10 (meaning bases doesn't fully span the reference), which would be indicated by basesStartOnRef == 10.
 *                        It's not trivial to eliminate this parameter because it's tied up with the cigar
 * @param basesToRefCigar the cigar that maps the bases to the reference genome
 * @return a byte[] containing the bases covering this interval, or null if we would start or end within a deletion
 */
public static byte[] getBasesCoveringRefInterval(final int refStart, final int refEnd, final byte[] bases, final int basesStartOnRef, final Cigar basesToRefCigar) {
    if ( refStart < 0 || refEnd < refStart ) throw new IllegalArgumentException("Bad start " + refStart + " and/or stop " + refEnd);
    if ( basesStartOnRef < 0 ) throw new IllegalArgumentException("BasesStartOnRef must be >= 0 but got " + basesStartOnRef);
    Utils.nonNull( bases );
    Utils.nonNull( basesToRefCigar );
    if ( bases.length != basesToRefCigar.getReadLength() ) throw new IllegalArgumentException("Mismatch in length between reference bases " + bases.length + " and cigar length " + basesToRefCigar);

    int refPos = basesStartOnRef;
    int basesPos = 0;
    int basesStart = -1;
    int basesStop = -1;
    boolean done = false;

    for ( int iii = 0; ! done && iii < basesToRefCigar.numCigarElements(); iii++ ) {
        final CigarElement ce = basesToRefCigar.getCigarElement(iii);
        switch ( ce.getOperator() ) {
            case I:
                basesPos += ce.getLength();
                break;
            case M: case X: case EQ:
                for ( int i = 0; i < ce.getLength(); i++ ) {
                    if ( refPos == refStart )
                        basesStart = basesPos;
                    if ( refPos == refEnd ) {
                        basesStop = basesPos;
                        done = true;
                        break;
                    }
                    refPos++;
                    basesPos++;
                }
                break;
            case D:
                for ( int i = 0; i < ce.getLength(); i++ ) {
                    if ( refPos == refEnd || refPos == refStart ) {
                        // if we ever reach a ref position that is either a start or an end, we fail
                        return null;
                    }
                    refPos++;
                }
                break;
            default:
                throw new IllegalStateException("Unsupported operator " + ce);
        }
    }

    if ( basesStart == -1 || basesStop == -1 )
        throw new IllegalStateException("Never found start " + basesStart + " or stop " + basesStop + " given cigar " + basesToRefCigar);

    return Arrays.copyOfRange(bases, basesStart, basesStop + 1);
}