Java Code Examples for htsjdk.samtools.Cigar#numCigarElements()

The following examples show how to use htsjdk.samtools.Cigar#numCigarElements() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: CigarUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Convert the 'I' CigarElement, if it is at either end (terminal) of the input cigar, to a corresponding 'S' operator.
 * Note that we allow CIGAR of the format '10H10S10I10M', but disallows the format if after the conversion the cigar turns into a giant clip,
 * e.g. '10H10S10I10S10H' is not allowed (if allowed, it becomes a giant clip of '10H30S10H' which is non-sense).
 *
 * @return a pair of number of clipped (hard and soft, including the ones from the converted terminal 'I') bases at the front and back of the
 *         input {@code cigarAlongInput5to3Direction}.
 *
 * @throws IllegalArgumentException when the checks as described above fail.
 */
public static Cigar convertTerminalInsertionToSoftClip(final Cigar cigar) {

    if (cigar.numCigarElements() < 2 ) {
        return cigar;
    }

    final CigarBuilder builder = new CigarBuilder();
    for (int n = 0; n < cigar.numCigarElements(); n++) {
        final CigarElement element = cigar.getCigarElement(n);
        if (element.getOperator() != CigarOperator.INSERTION) { // not an insertion
            builder.add(element);
        } else if (n == 0 || n == cigar.numCigarElements() - 1) {   // terminal insertion with no clipping -- convert to soft clip
            builder.add(new CigarElement(element.getLength(), CigarOperator.SOFT_CLIP));
        } else if (cigar.getCigarElement(n-1).getOperator().isClipping() || cigar.getCigarElement(n+1).getOperator().isClipping()) {    // insertion preceding or following clip
            builder.add(new CigarElement(element.getLength(), CigarOperator.SOFT_CLIP));
        } else {    // interior insertion
            builder.add(element);
        }
    }

    return builder.make();
}
 
Example 2
Source File: CigarUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Count the number of soft- or hard- clipped bases from either the left or right end of a cigar
 */
public static int countClippedBases(final Cigar cigar, final ClippingTail tail, final CigarOperator typeOfClip) {
    Utils.validateArg(typeOfClip.isClipping(), "typeOfClip must be a clipping operator");

    final int size = cigar.numCigarElements();
    if (size < 2) {
        Utils.validateArg(size == 1 && !cigar.getFirstCigarElement().getOperator().isClipping(), "cigar is empty or completely clipped.");
        return 0;
    }

    int result = 0;

    for (int n = 0; n < size; n++) {
        final int index = (tail == ClippingTail.LEFT_TAIL ? n : size - n - 1);
        final CigarElement element = cigar.getCigarElement(index);
        if (!element.getOperator().isClipping()) {
            return result;
        } else if (element.getOperator() == typeOfClip) {
            result += element.getLength();
        }
    }

    throw new IllegalArgumentException("Input cigar " + cigar + " is completely clipped.");
}
 
Example 3
Source File: SamUtils.java    From rtg-tools with BSD 2-Clause "Simplified" License 6 votes vote down vote up
/**
 * Converts a CIGAR that may contain new style operators for match and
 * mismatch into the legacy style CIGAR.
 * @param cigar the CIGAR to convert
 * @return the legacy CIGAR
 */
public static Cigar convertToLegacyCigar(Cigar cigar) {
  final Cigar cg = new Cigar();
  int count = 0;
  for (int i = 0; i < cigar.numCigarElements(); ++i) {
    final CigarElement ce = cigar.getCigarElement(i);
    if (ce.getOperator() == CigarOperator.EQ || ce.getOperator() == CigarOperator.X) {
      count += ce.getLength();
    } else {
      if (count > 0) {
        cg.add(new CigarElement(count, CigarOperator.M));
      }
      cg.add(ce);
      count = 0;
    }
  }
  if (count > 0) {
    cg.add(new CigarElement(count, CigarOperator.M));
  }
  return cg;
}
 
Example 4
Source File: NDNCigarReadTransformer.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements).
 */
@VisibleForTesting
protected static Cigar refactorNDNtoN(final Cigar originalCigar) {
    final Cigar refactoredCigar = new Cigar();
    final int cigarLength = originalCigar.numCigarElements();
    for(int i = 0; i < cigarLength; i++){
        final CigarElement element = originalCigar.getCigarElement(i);
        if(element.getOperator() == CigarOperator.N && thereAreAtLeast2MoreElements(i,cigarLength)){
            final CigarElement nextElement = originalCigar.getCigarElement(i+1);
            final CigarElement nextNextElement = originalCigar.getCigarElement(i+2);

            // if it is N-D-N replace with N (with the total length) otherwise just add the first N.
            if(nextElement.getOperator() == CigarOperator.D && nextNextElement.getOperator() == CigarOperator.N){
                final int threeElementsLength = element.getLength() + nextElement.getLength() + nextNextElement.getLength();
                final CigarElement refactoredElement = new CigarElement(threeElementsLength,CigarOperator.N);
                refactoredCigar.add(refactoredElement);
                i += 2; //skip the elements that were refactored
            } else {
                refactoredCigar.add(element);  // add only the first N
            }
        } else {
            refactoredCigar.add(element);  // add any non-N element
        }
    }
    return refactoredCigar;
}
 
Example 5
Source File: CigarUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Removes all clipping and padding operators from the cigar.
 */
public static Cigar removeClipsAndPadding(final Cigar cigar) {
    Utils.nonNull(cigar, "cigar is null");
    final List<CigarElement> elements = new ArrayList<>(cigar.numCigarElements());
    for ( final CigarElement ce : cigar.getCigarElements() ) {
        if ( !isClipOrPaddingOperator(ce.getOperator()) ) {
            elements.add(ce);
        }
    }
    return new Cigar(elements);
}
 
Example 6
Source File: BAMRecordView.java    From cramtools with Apache License 2.0 5 votes vote down vote up
/**
 * From htsjdk
 * 
 * @param cigar
 * @return
 */
static int[] encodeBinaryCigar(final Cigar cigar) {
	if (cigar.numCigarElements() == 0) {
		return new int[0];
	}

	final int[] binaryCigar = new int[cigar.numCigarElements()];
	int binaryCigarLength = 0;
	for (int i = 0; i < cigar.numCigarElements(); ++i) {
		final CigarElement cigarElement = cigar.getCigarElement(i);
		final int op = CigarOperator.enumToBinary(cigarElement.getOperator());
		binaryCigar[binaryCigarLength++] = cigarElement.getLength() << 4 | op;
	}
	return binaryCigar;
}
 
Example 7
Source File: SAMRecords.java    From hmftools with GNU General Public License v3.0 5 votes vote down vote up
public static int leftSoftClip(@NotNull final SAMRecord record) {
    Cigar cigar = record.getCigar();
    if (cigar.numCigarElements() > 0) {
        CigarElement firstElement = cigar.getCigarElement(0);
        if (firstElement.getOperator() == CigarOperator.S) {
            return firstElement.getLength();
        }
    }

    return 0;
}
 
Example 8
Source File: Haplotype.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Create a new Haplotype derived from this one that exactly spans the provided location
 *
 * Note that this haplotype must have a contain a genome loc for this operation to be successful.  If no
 * GenomeLoc is contained than @throws an IllegalStateException
 *
 * Also loc must be fully contained within this Haplotype's genomeLoc.  If not an IllegalArgumentException is
 * thrown.
 *
 * @param loc a location completely contained within this Haplotype's location
 * @return a new Haplotype within only the bases spanning the provided location, or null for some reason the haplotype would be malformed if
 */
public Haplotype trim(final Locatable loc) {
    Utils.nonNull( loc, "Loc cannot be null");
    Utils.nonNull(genomeLocation, "Cannot trim a Haplotype without containing GenomeLoc");
    Utils.validateArg(new SimpleInterval(genomeLocation).contains(loc), () -> "Can only trim a Haplotype to a containing span.  My loc is " + genomeLocation + " but wanted trim to " + loc);
    Utils.nonNull( getCigar(), "Cannot trim haplotype without a cigar " + this);

    final int newStart = loc.getStart() - this.genomeLocation.getStart();
    final int newStop = newStart + loc.getEnd() - loc.getStart();

    // note: the following returns null if the bases covering the ref interval start or end in a deletion.
    final byte[] newBases = AlignmentUtils.getBasesCoveringRefInterval(newStart, newStop, getBases(), 0, getCigar());

    if ( newBases == null || newBases.length == 0 ) { // we cannot meaningfully chop down the haplotype, so return null
        return null;
    }

    // note: trimCigarByReference does not remove leading or trailing indels, while getBasesCoveringRefInterval does remove bases
    // of leading and trailing insertions.  We must remove leading and trailing insertions from the Cigar manually.
    // we keep leading and trailing deletions because these are necessary for haplotypes to maintain consistent reference coordinates
    final Cigar newCigar = AlignmentUtils.trimCigarByReference(getCigar(), newStart, newStop).getCigar();
    final boolean leadingInsertion = !newCigar.getFirstCigarElement().getOperator().consumesReferenceBases();
    final boolean trailingInsertion = !newCigar.getLastCigarElement().getOperator().consumesReferenceBases();
    final int firstIndexToKeepInclusive = leadingInsertion ? 1 : 0;
    final int lastIndexToKeepExclusive = newCigar.numCigarElements() - (trailingInsertion ? 1 : 0);

    if (lastIndexToKeepExclusive <= firstIndexToKeepInclusive) {    // edge case of entire cigar is insertion
        return null;
    }

    final Cigar leadingIndelTrimmedNewCigar = !(leadingInsertion || trailingInsertion)  ? newCigar :
            new CigarBuilder(false).addAll(newCigar.getCigarElements().subList(firstIndexToKeepInclusive, lastIndexToKeepExclusive)).make();

    final Haplotype ret = new Haplotype(newBases, isReference());
    ret.setCigar(leadingIndelTrimmedNewCigar);
    ret.setGenomeLocation(loc);
    ret.setScore(score);
    ret.setAlignmentStartHapwrtRef(newStart + getAlignmentStartHapwrtRef());
    return ret;
}
 
Example 9
Source File: CigarModifierTest.java    From VarDictJava with MIT License 5 votes vote down vote up
@Test(dataProvider = "cigar")
public void getCigarOperatorTest(Object cigarObject) {
    Cigar cigar = (Cigar) cigarObject;
    CigarOperator[] operators = new CigarOperator[] {
            CigarOperator.M,
            CigarOperator.S,
            CigarOperator.I,
            CigarOperator.D,
            CigarOperator.N,
            CigarOperator.H,
    };
    for (int i = 0; i < cigar.numCigarElements(); i++) {
        Assert.assertEquals(CigarParser.getCigarOperator(cigar, i), operators[i]);
    }
}
 
Example 10
Source File: SAMRecordUtils.java    From abra2 with MIT License 5 votes vote down vote up
/**
 * Remove leading or trailing soft clips from the input read.
 * Does not modify a read entirely comprised of soft clips.
 */
public static void removeSoftClips(SAMRecord read) {
	
	Cigar cigar = read.getCigar();
	
	CigarElement firstElement = cigar.getCigarElement(0);
	CigarElement lastElement  = cigar.getCigarElement(cigar.numCigarElements()-1);
	
	if ((firstElement.getOperator() == CigarOperator.S) ||
		(lastElement.getOperator() == CigarOperator.S)) {
	
		Cigar newCigar = new Cigar();
		
		String bases = read.getReadString();
		//String qualities = read.getBaseQualityString();
				
		if (firstElement.getOperator() == CigarOperator.S) {
			bases = bases.substring(firstElement.getLength(), bases.length());
			//qualities = qualities.substring(firstElement.getLength(), qualities.length()-1);
		} else {
			newCigar.add(firstElement);
		}
		
		for (int i=1; i<cigar.numCigarElements()-1; i++) {
			newCigar.add(cigar.getCigarElement(i));
		}
		
		if (lastElement.getOperator() == CigarOperator.S) {
			bases = bases.substring(0, bases.length() - lastElement.getLength());
			//qualities = qualities.substring(0, qualities.length() - lastElement.getLength() - 1);
		} else {
			newCigar.add(lastElement);
		}
		
		read.setCigar(newCigar);
		read.setReadString(bases);
		//read.setBaseQualityString(qualities);
	}
}
 
Example 11
Source File: SAMRecords.java    From hmftools with GNU General Public License v3.0 5 votes vote down vote up
public static int rightSoftClip(@NotNull final SAMRecord record) {
    Cigar cigar = record.getCigar();
    if (cigar.numCigarElements() > 0) {
        CigarElement lastLement = cigar.getCigarElement(cigar.numCigarElements() - 1);
        if (lastLement.getOperator() == CigarOperator.S) {
            return lastLement.getLength();
        }
    }

    return 0;
}
 
Example 12
Source File: CigarTraversal.java    From hmftools with GNU General Public License v3.0 4 votes vote down vote up
public static void traverseCigar(@NotNull final SAMRecord record, @NotNull final CigarHandler handler) {
    final Cigar cigar = record.getCigar();

    int readIndex = 0;
    int refBase = record.getAlignmentStart();

    for (int i = 0; i < cigar.numCigarElements(); i++) {
        final CigarElement e = cigar.getCigarElement(i);
        switch (e.getOperator()) {
            case H:
                break; // ignore hard clips
            case P:
                break; // ignore pads
            case S:
                if (i == 0) {
                    handler.handleLeftSoftClip(record, e);
                } else if (i == cigar.numCigarElements() - 1) {
                    handler.handleRightSoftClip(record, e, readIndex, refBase);
                }
                readIndex += e.getLength();
                break; // soft clip read bases
            case N:
                handler.handleSkippedReference(record, e, readIndex - 1, refBase - 1);
                refBase += e.getLength();
                break;  // reference skip
            case D:
                handler.handleDelete(record, e, readIndex - 1, refBase - 1);
                refBase += e.getLength();
                break;
            case I:
                // TODO: Handle 1I150M
                int refIndex = refBase - 1 - record.getAlignmentStart();
                if (refIndex >= 0) {
                    handler.handleInsert(record, e, readIndex - 1, refBase - 1);
                }
                readIndex += e.getLength();
                break;
            case M:
            case EQ:
            case X:
                boolean isFollowedByIndel = i < cigar.numCigarElements() - 1 && cigar.getCigarElement(i + 1).getOperator().isIndel();
                final CigarElement element = isFollowedByIndel ? new CigarElement(e.getLength() - 1, e.getOperator()) : e;
                handler.handleAlignment(record, element, readIndex, refBase);
                readIndex += e.getLength();
                refBase += e.getLength();
                break;
            default:
                throw new IllegalStateException("Case statement didn't deal with op: " + e.getOperator() + "in CIGAR: " + cigar);
        }
    }
}
 
Example 13
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the byte[] from bases that cover the reference interval refStart -> refEnd given the
 * alignment of bases to the reference (basesToRefCigar) and the start offset of the bases on the reference
 *
 * refStart and refEnd are 0 based offsets that we want to obtain.  In the client code, if the reference
 * bases start at position X and you want Y -> Z, refStart should be Y - X and refEnd should be Z - X.
 *
 * If refStart or refEnd would start or end the new bases within a deletion, this function will return null
 *
 * If the cigar starts with an insertion, the inserted bases are considered as coming before the start position and
 * are therefore excluded from the result.  That is getBasesCoveringRefInterval(0, 3, "ACTTGT", 0, 2I4M) should yield "TTGT".
 *
 * @param bases
 * @param refStart
 * @param refEnd
 * @param basesStartOnRef where does the bases array start w.r.t. the reference start?  For example, bases[0] of
 *                        could be at refStart == 0 if basesStartOnRef == 0, but it could just as easily be at
 *                        10 (meaning bases doesn't fully span the reference), which would be indicated by basesStartOnRef == 10.
 *                        It's not trivial to eliminate this parameter because it's tied up with the cigar
 * @param basesToRefCigar the cigar that maps the bases to the reference genome
 * @return a byte[] containing the bases covering this interval, or null if we would start or end within a deletion
 */
public static byte[] getBasesCoveringRefInterval(final int refStart, final int refEnd, final byte[] bases, final int basesStartOnRef, final Cigar basesToRefCigar) {
    if ( refStart < 0 || refEnd < refStart ) throw new IllegalArgumentException("Bad start " + refStart + " and/or stop " + refEnd);
    if ( basesStartOnRef < 0 ) throw new IllegalArgumentException("BasesStartOnRef must be >= 0 but got " + basesStartOnRef);
    Utils.nonNull( bases );
    Utils.nonNull( basesToRefCigar );
    if ( bases.length != basesToRefCigar.getReadLength() ) throw new IllegalArgumentException("Mismatch in length between reference bases " + bases.length + " and cigar length " + basesToRefCigar);

    int refPos = basesStartOnRef;
    int basesPos = 0;
    int basesStart = -1;
    int basesStop = -1;
    boolean done = false;

    for ( int iii = 0; ! done && iii < basesToRefCigar.numCigarElements(); iii++ ) {
        final CigarElement ce = basesToRefCigar.getCigarElement(iii);
        switch ( ce.getOperator() ) {
            case I:
                basesPos += ce.getLength();
                break;
            case M: case X: case EQ:
                for ( int i = 0; i < ce.getLength(); i++ ) {
                    if ( refPos == refStart )
                        basesStart = basesPos;
                    if ( refPos == refEnd ) {
                        basesStop = basesPos;
                        done = true;
                        break;
                    }
                    refPos++;
                    basesPos++;
                }
                break;
            case D:
                for ( int i = 0; i < ce.getLength(); i++ ) {
                    if ( refPos == refEnd || refPos == refStart ) {
                        // if we ever reach a ref position that is either a start or an end, we fail
                        return null;
                    }
                    refPos++;
                }
                break;
            default:
                throw new IllegalStateException("Unsupported operator " + ce);
        }
    }

    if ( basesStart == -1 || basesStop == -1 )
        throw new IllegalStateException("Never found start " + basesStart + " or stop " + basesStop + " given cigar " + basesToRefCigar);

    return Arrays.copyOfRange(bases, basesStart, basesStop + 1);
}
 
Example 14
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Generate an array of bases for just those that are aligned to the reference (i.e. no clips or insertions)
 *
 * @param cigar            the read's CIGAR -- cannot be null
 * @param read             the read's base array
 * @return a non-null array of bases (bytes)
 */
@SuppressWarnings("fallthrough")
public static byte[] readToAlignmentByteArray(final Cigar cigar, final byte[] read) {
    Utils.nonNull(cigar);
    Utils.nonNull(read);

    final int alignmentLength = cigar.getReferenceLength();
    final byte[] alignment = new byte[alignmentLength];
    int alignPos = 0;
    int readPos = 0;
    for (int iii = 0; iii < cigar.numCigarElements(); iii++) {

        final CigarElement ce = cigar.getCigarElement(iii);
        final int elementLength = ce.getLength();

        switch (ce.getOperator()) {
            case I:
                if (alignPos > 0) {
                    final int prevPos = alignPos - 1;
                    if (alignment[prevPos] == BaseUtils.Base.A.base) {
                        alignment[prevPos] = PileupElement.A_FOLLOWED_BY_INSERTION_BASE;
                    } else if (alignment[prevPos] == BaseUtils.Base.C.base) {
                        alignment[prevPos] = PileupElement.C_FOLLOWED_BY_INSERTION_BASE;
                    } else if (alignment[prevPos] == BaseUtils.Base.T.base) {
                        alignment[prevPos] = PileupElement.T_FOLLOWED_BY_INSERTION_BASE;
                    } else if (alignment[prevPos] == BaseUtils.Base.G.base) {
                        alignment[prevPos] = PileupElement.G_FOLLOWED_BY_INSERTION_BASE;
                    }
                }
            case S:
                readPos += elementLength;
                break;
            case D:
            case N:
                for (int jjj = 0; jjj < elementLength; jjj++) {
                    alignment[alignPos++] = PileupElement.DELETION_BASE;
                }
                break;
            case M:
            case EQ:
            case X:
                for (int jjj = 0; jjj < elementLength; jjj++) {
                    alignment[alignPos++] = read[readPos++];
                }
                break;
            case H:
            case P:
                break;
            default:
                throw new GATKException("Unsupported cigar operator: " + ce.getOperator());
        }
    }
    return alignment;
}
 
Example 15
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Takes the alignment of the read sequence <code>readSeq</code> to the reference sequence <code>refSeq</code>
 * starting at 0-based position <code>readStart</code> on the <code>ref</code> and specified by its <code>cigar</code>.
 * <p/>
 * If the alignment has one or more indels, this method attempts to move them left across a stretch of repetitive bases.
 * For instance, if the original cigar specifies that (any) one AT is deleted from a repeat sequence TATATATA, the output
 * cigar will always mark the leftmost AT as deleted. If there is no indel in the original cigar or if the indel position
 * is determined unambiguously (i.e. inserted/deleted sequence is not repeated), the original cigar is returned.
 *
 *
 * @param cigar     structure of the original alignment
 * @param ref    reference sequence the read is aligned to
 * @param read   read sequence
 * @param readStart  0-based alignment start position on ref
 * @return a non-null cigar, in which the indels are guaranteed to be placed at the leftmost possible position across a repeat (if any)
 */
public static CigarBuilder.Result leftAlignIndels(final Cigar cigar, final byte[] ref, final byte[] read, final int readStart) {
    ParamUtils.isPositiveOrZero(readStart, "read start within reference base array must be non-negative");

    if (cigar.getCigarElements().stream().noneMatch(elem -> elem.getOperator().isIndel())) {
        return new CigarBuilder.Result(cigar, 0, 0);
    }

    // we need reference bases from the start of the read to the rightmost indel
    final int lastIndel = IntStream.range(0, cigar.numCigarElements()).filter(n -> cigar.getCigarElement(n).getOperator().isIndel()).max().getAsInt();
    final int necessaryRefLength = readStart + cigar.getCigarElements().stream().limit(lastIndel + 1).mapToInt(e -> lengthOnReference(e)).sum();
    Utils.validateArg(necessaryRefLength <= ref.length, "read goes past end of reference");

    // at this point, we are one base past the end of the read.  Now we traverse the cigar from right to left
    final List<CigarElement> resultRightToLeft = new ArrayList<>();
    final int refLength = cigar.getReferenceLength();
    final IndexRange refIndelRange = new IndexRange(readStart + refLength, readStart + refLength);
    final IndexRange readIndelRange = new IndexRange(read.length,read.length);
    for (int n = cigar.numCigarElements() - 1; n >= 0; n--) {
        final CigarElement element = cigar.getCigarElement(n);
        // if it's an indel, just accumulate the read and ref bases consumed.  We won't shift the indel until we hit an alignment
        // block or the read start.
        if (element.getOperator().isIndel()) {
            refIndelRange.shiftStartLeft(lengthOnReference(element));
            readIndelRange.shiftStartLeft(lengthOnRead(element));
        } else if (refIndelRange.size() == 0 && readIndelRange.size() == 0) {   // no indel, just add the cigar element to the result
            resultRightToLeft.add(element);
            refIndelRange.shiftLeft(lengthOnReference(element));
            readIndelRange.shiftLeft(lengthOnRead(element));
        } else {    // there's an indel that we need to left-align
            // we can left-align into match cigar elements but not into clips
            final int maxShift = element.getOperator().isAlignment() ? element.getLength() : 0;
            final Pair<Integer, Integer> shifts = normalizeAlleles(Arrays.asList(ref, read), Arrays.asList(refIndelRange, readIndelRange), maxShift, true);

            // account for new match alignments on the right due to left-alignment
            resultRightToLeft.add(new CigarElement(shifts.getRight(), CigarOperator.MATCH_OR_MISMATCH));

            // emit if we didn't go all the way to the start of an alignment block OR we have reached clips OR we have reached the start of the cigar
            final boolean emitIndel = n == 0 || shifts.getLeft() < maxShift || !element.getOperator().isAlignment();
            final int newMatchOnLeftDueToTrimming = shifts.getLeft() < 0 ? -shifts.getLeft() : 0;   // we may have actually shifted right to make the alleles parsimonious
            final int remainingBasesOnLeft = shifts.getLeft() < 0 ? element.getLength() : (element.getLength() - shifts.getLeft());

            if (emitIndel) {  // some of this alignment block remains after left-alignment -- emit the indel
                resultRightToLeft.add(new CigarElement(refIndelRange.size(), CigarOperator.DELETION));
                resultRightToLeft.add(new CigarElement(readIndelRange.size(), CigarOperator.INSERTION));
                refIndelRange.shiftEndLeft(refIndelRange.size());       // ref is empty and points to start of left-aligned indel
                readIndelRange.shiftEndLeft(readIndelRange.size());     // read is empty and points to start of left-aligned indel
                refIndelRange.shiftLeft(remainingBasesOnLeft + newMatchOnLeftDueToTrimming);          // ref is empty and points to end of element preceding this match block
                readIndelRange.shiftLeft(remainingBasesOnLeft + newMatchOnLeftDueToTrimming);         // read is empty and points to end of element preceding this match block
            }
            resultRightToLeft.add(new CigarElement(newMatchOnLeftDueToTrimming, CigarOperator.MATCH_OR_MISMATCH));
            resultRightToLeft.add(new CigarElement(remainingBasesOnLeft, element.getOperator()));
        }
    }

    // account for any indels at the start of the cigar that weren't processed because they have no adjacent non-indel element to the left
    resultRightToLeft.add(new CigarElement(refIndelRange.size(), CigarOperator.DELETION));
    resultRightToLeft.add(new CigarElement(readIndelRange.size(), CigarOperator.INSERTION));

    Utils.validateArg(readIndelRange.getStart() == 0, "Given cigar does not account for all bases of the read");
    return new CigarBuilder().addAll(Lists.reverse(resultRightToLeft)).makeAndRecordDeletionsRemovedResult();
}
 
Example 16
Source File: SingleContigReferenceAlignerUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@Test(dataProvider = "testAlignmentData")
public void testAlignment(final boolean pairAlignment, final byte[] reference, final String referenceName)
    throws IOException
{
    try (final SingleSequenceReferenceAligner<byte[], AlignedContig> aligner =  SingleSequenceReferenceAligner.contigsAligner(referenceName, reference,
            a -> "ctg", a -> a);) {
        Assert.assertNotNull(aligner.getAligner());
        if (pairAlignment) {
            aligner.getAligner().alignPairs();
        }
        final Random rdn = new Random(13111);
        final RandomDNA rdnDNA = new RandomDNA(rdn);
        final List<List<AlignmentInterval>> expected = new ArrayList<>(NUM_ALIGNS << 1);
        final List<byte[]> seqs = new ArrayList<>(NUM_ALIGNS << 1);
        for (int i = 0; i < NUM_ALIGNS; i++) {
            final int start = rdn.nextInt(reference.length - READ_LENGTH) + 1;
            final boolean forward = rdn.nextBoolean();
            final boolean insert = rdn.nextDouble() < 0.1;
            final boolean deletion = !insert && rdn.nextDouble() < 0.1;
            final int indelLength = insert || deletion ? rdn.nextInt(10) + 10 : 0;
            final int indelStart = insert || deletion ? rdn.nextInt((int) (READ_LENGTH  * .50)) + 25 : -1;
            final int end = start + READ_LENGTH - 1;
            final byte[] templateSeq = Arrays.copyOfRange(reference, start - 1, end);
            final byte[] actualSeq;
            if (insert) {
                actualSeq = Arrays.copyOf(templateSeq, templateSeq.length + indelLength);
                System.arraycopy(actualSeq, indelStart - 1, actualSeq, indelStart - 1 + indelLength, templateSeq.length - indelStart + 1);
                rdnDNA.nextBases(actualSeq, indelStart - 1, indelLength);
            } else if (deletion) {
                actualSeq = Arrays.copyOf(templateSeq, templateSeq.length - indelLength);
                System.arraycopy(templateSeq, indelStart - 1 + indelLength, actualSeq, indelStart - 1, templateSeq.length - indelStart + 1 - indelLength);
            } else {
                actualSeq = templateSeq.clone();
            }

            while (insert && actualSeq[indelStart - 1] == actualSeq[indelStart + indelLength - 1]) {
                actualSeq[indelStart + indelLength - 1] = rdnDNA.nextBase();
            }
            if (!forward) {
                SequenceUtil.reverseComplement(actualSeq);
            }
            seqs.add(actualSeq);
            final Cigar cigar = (!insert && !deletion) ? TextCigarCodec.decode(READ_LENGTH + "M"):
                    (insert ? TextCigarCodec.decode( "" + (indelStart - 1) + "M" + indelLength + "I" + (READ_LENGTH - indelStart + 1) + "M")
                            : TextCigarCodec.decode( "" + (indelStart - 1) + "M" + indelLength + "D" + (READ_LENGTH - indelStart + 1 - indelLength) + "M"));

            expected.add(Collections.singletonList(new AlignmentInterval(new SimpleInterval(referenceName, start, end), 1, actualSeq.length, !forward ? CigarUtils.invertCigar(cigar) : cigar
                    , forward, 0, 0, 0, null)));
        }
        final List<AlignedContig> results = aligner.align(seqs);
        final Map<byte[], AlignedContig> mapResult = aligner.align(seqs, (b, a) -> b);
        Assert.assertEquals(results, new ArrayList<>(mapResult.values()));
        Assert.assertEquals(new ArrayList<>(mapResult.keySet()), mapResult.values().stream().map(AlignedContig::getContigSequence).collect(Collectors.toList()));
        for (int i = 0; i < NUM_ALIGNS; i++) {
            final List<AlignmentInterval> actualValue = results.get(i).getAlignments();
            final List<AlignmentInterval> expectedValue = expected.get(i);
            Assert.assertEquals(actualValue.size(), 1);
            Assert.assertEquals(actualValue.get(0).forwardStrand, expectedValue.get(0).forwardStrand);
            Assert.assertEquals(actualValue.get(0).referenceSpan, expectedValue.get(0).referenceSpan, expectedValue.get(0).cigarAlong5to3DirectionOfContig.toString());
            Assert.assertEquals(actualValue.get(0).startInAssembledContig, expectedValue.get(0).startInAssembledContig);
            Assert.assertEquals(actualValue.get(0).endInAssembledContig, expectedValue.get(0).endInAssembledContig);
            final Cigar expectedCigar = expectedValue.get(0).cigarAlong5to3DirectionOfContig;
            final Cigar actualCigar = actualValue.get(0).cigarAlong5to3DirectionOfContig;
            if (!expectedCigar.equals(actualCigar)) { // small differences may occur due to ambiguous indel location. So we check that they are small differences indeed:
                Assert.assertEquals(expectedCigar.numCigarElements(), actualCigar.numCigarElements()); // same number of elements
                Assert.assertEquals(expectedCigar.getCigarElements().stream().map(CigarElement::getOperator).collect(Collectors.toList()),
                        actualCigar.getCigarElements().stream().map(CigarElement::getOperator).collect(Collectors.toList())); // same operators sequence.
                // then we check the total lengths per operator (must be the same):
                final Map<CigarOperator, Integer> expectedLengthByOperator = expectedCigar.getCigarElements().stream()
                        .collect(Collectors.groupingBy(CigarElement::getOperator,
                                Collectors.reducing(0, CigarElement::getLength, (a, b) -> a + b)));
                final Map<CigarOperator, Integer> actualLengthByOperator = actualCigar.getCigarElements().stream()
                        .collect(Collectors.groupingBy(CigarElement::getOperator,
                                Collectors.reducing(0, CigarElement::getLength, (a, b) -> a + b)));
                Assert.assertEquals(actualLengthByOperator, expectedLengthByOperator);
                // finally we don't allow more than 5 bases length difference for any given element.
                for (int j = 0; j < expectedCigar.numCigarElements(); j++) {
                    Assert.assertTrue(Math.abs(expectedCigar.getCigarElement(j).getLength() - actualCigar.getCigarElement(j).getLength()) < 10, "actual: " + actualCigar + " != expected: " + expectedCigar);
                }
            }
        }
    }
}
 
Example 17
Source File: AlignmentUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 3 votes vote down vote up
/**
 * Generate a new Cigar that maps the operations of the first cigar through those in a second
 *
 * For example, if first is 5M and the second is 2M1I2M then the result is 2M1I2M.
 * However, if first is 1M2D3M and second is 2M1I3M this results in a cigar X
 *
 * ref   : AC-GTA
 * hap   : ACxGTA  - 2M1I3M
 * read  : A--GTA  - 1M2D3M
 * result: A--GTA => 1M1D3M
 *
 * ref   : ACxG-TA
 * hap   : AC-G-TA  - 2M1D3M
 * read  : AC-GxTA  - 3M1I2M
 * result: AC-GxTA => 2M1D1M1I2M
 *
 * ref   : ACGTA
 * hap   : ACGTA  - 5M
 * read  : A-GTA  - 1M1I3M
 * result: A-GTA => 1M1I3M
 *
 * ref   : ACGTAC
 * hap   : AC---C  - 2M3D1M
 * read  : AC---C  - 3M
 * result: AG---C => 2M3D
 *
 * The constraint here is that both cigars should imply that the result have the same number of
 * reference bases (i.e.g, cigar.getReferenceLength() are equals).
 *
 * @param firstToSecond the cigar mapping hap1 -> hap2
 * @param secondToThird the cigar mapping hap2 -> hap3
 * @return A cigar mapping hap1 -> hap3
 */
public static Cigar applyCigarToCigar(final Cigar firstToSecond, final Cigar secondToThird) {
    final CigarBuilder newElements = new CigarBuilder();
    final int nElements12 = firstToSecond.numCigarElements();
    final int nElements23 = secondToThird.numCigarElements();

    int cigar12I = 0, cigar23I = 0;
    int elt12I = 0, elt23I = 0;

    while ( cigar12I < nElements12 && cigar23I < nElements23 ) {
        final CigarElement elt12 = firstToSecond.getCigarElement(cigar12I);
        final CigarElement elt23 = secondToThird.getCigarElement(cigar23I);

        final CigarPairTransform transform = getTransformer(elt12.getOperator(), elt23.getOperator());

        if ( transform.op13 != null ) // skip no ops
            newElements.add(new CigarElement(1, transform.op13));

        elt12I += transform.advance12;
        elt23I += transform.advance23;

        // if have exhausted our current element, advance to the next one
        if ( elt12I == elt12.getLength() ) { cigar12I++; elt12I = 0; }
        if ( elt23I == elt23.getLength() ) { cigar23I++; elt23I = 0; }
    }

    return newElements.make();
}