Java Code Examples for htsjdk.variant.variantcontext.Allele#length()

The following examples show how to use htsjdk.variant.variantcontext.Allele#length() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: Haplotype.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
public Haplotype insertAllele( final Allele refAllele, final Allele altAllele, final int refInsertLocation, final int genomicInsertLocation ) {
    // refInsertLocation is in ref haplotype offset coordinates NOT genomic coordinates
    final Pair<Integer, CigarOperator> haplotypeInsertLocationAndOperator = ReadUtils.getReadIndexForReferenceCoordinate(alignmentStartHapwrtRef, cigar, refInsertLocation);

    // can't insert outside the haplotype or into a deletion
    if( haplotypeInsertLocationAndOperator.getLeft() == ReadUtils.READ_INDEX_NOT_FOUND || !haplotypeInsertLocationAndOperator.getRight().consumesReadBases() ) {
        return null;
    }
    final int haplotypeInsertLocation = haplotypeInsertLocationAndOperator.getLeft();
    final byte[] myBases = getBases();

    // can't insert if we don't have any sequence after the inserted alt allele to span the new variant
    if (haplotypeInsertLocation + refAllele.length() >= myBases.length) {
        return null;
    }

    byte[] newHaplotypeBases = {};
    newHaplotypeBases = ArrayUtils.addAll(newHaplotypeBases, ArrayUtils.subarray(myBases, 0, haplotypeInsertLocation)); // bases before the variant
    newHaplotypeBases = ArrayUtils.addAll(newHaplotypeBases, altAllele.getBases()); // the alt allele of the variant
    newHaplotypeBases = ArrayUtils.addAll(newHaplotypeBases, ArrayUtils.subarray(myBases, haplotypeInsertLocation + refAllele.length(), myBases.length)); // bases after the variant
    return new Haplotype(newHaplotypeBases);
}
 
Example 2
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Get the start position of the difference between the reference and alternate alleles in the given {@link VariantContext}.
 * @param variant {@link VariantContext} in which to determine the start of the different bases.
 * @param altAllele The alternate {@link Allele} against which to check the reference allele in {@code variant}.
 * @return The start position of the difference between the reference and alternate alleles in the given {@link VariantContext}.
 */
public static int getIndelAdjustedAlleleChangeStartPosition(final VariantContext variant, final Allele altAllele) {

    // If the variant is an indel, we need to check only the bases that are added/deleted for overlap.
    // The convention for alleles in Funcotator is to preserve a leading base for an indel, so we just need
    // to create a new variant that has its start position shifted by the leading base.
    // NOTE: because there could be degenerate VCF files that have more than one leading base overlapping, we need
    //       to detect how many leading bases there are that overlap, rather than assuming there is only one.
    final int varStart;
    if ( GATKVariantContextUtils.typeOfVariant(variant.getReference(), altAllele).equals(VariantContext.Type.INDEL) &&
         !GATKVariantContextUtils.isComplexIndel(variant.getReference(), altAllele) ) {
        int startOffset = 0;
        while ( (startOffset < variant.getReference().length()) && (startOffset < altAllele.length()) && (variant.getReference().getBases()[ startOffset ] == altAllele.getBases()[ startOffset ]) ) {
            ++startOffset;
        }
        varStart = variant.getStart() + startOffset;
    }
    else {
        // Not an indel?  Then we should have no overlapping bases:
        varStart = variant.getStart();
    }

    return varStart;
}
 
Example 3
Source File: GencodeFuncotationFactory.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Determines the variant type based on the given reference allele and alternate allele.
 * @param refAllele The reference {@link Allele} for this variant.
 * @param altAllele The alternate {@link Allele} for this variant.
 * @return A {@link GencodeFuncotation.VariantType} representing the variation type between the given reference and alternate {@link Allele}.
 * Spanning deletions and no calls will get a type of {@link org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotation.VariantType#NA}
 */
@VisibleForTesting
static GencodeFuncotation.VariantType getVariantType( final Allele refAllele, final Allele altAllele ) {

    if ( altAllele.length() > refAllele.length() ) {
        return GencodeFuncotation.VariantType.INS;
    }
    else if (altAllele.equals(Allele.SPAN_DEL) || altAllele.equals(Allele.NO_CALL) || altAllele.isSymbolic() ) {
        return GencodeFuncotation.VariantType.NA;
    }
    else if (altAllele.length() < refAllele.length()) {
        return GencodeFuncotation.VariantType.DEL;
    }
    else {
        // We know they are the same length, now we just need to check one of them:
        switch (refAllele.length()) {
            case 1:  return GencodeFuncotation.VariantType.SNP;
            case 2:  return GencodeFuncotation.VariantType.DNP;
            case 3:  return GencodeFuncotation.VariantType.TNP;
            default: return GencodeFuncotation.VariantType.ONP;
        }
    }
}
 
Example 4
Source File: LiftoverUtils.java    From picard with MIT License 5 votes vote down vote up
private static boolean isIndelForLiftover(final VariantContext vc) {
    final Allele ref = vc.getReference();
    if (ref.length() != 1) {
        return true;
    }

    return vc.getAlleles().stream()
            .filter(a -> !a.isSymbolic())
            .filter(a -> !a.equals(Allele.SPAN_DEL))
            .anyMatch(a -> a.length() != 1);
}
 
Example 5
Source File: PairHMM.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
private static int findMaxAlleleLength(final List<? extends Allele> alleles) {
    int max = 0;
    for (final Allele allele : alleles) {
        final int alleleLength = allele.length();
        if (max < alleleLength) {
            max = alleleLength;
        }
    }
    return max;
}
 
Example 6
Source File: VariantDataManager.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
protected static boolean checkVariationClass( final VariantContext evalVC, final Allele allele, final VariantRecalibratorArgumentCollection.Mode mode ) {
    switch( mode ) {
        case SNP:
            //note that spanning deletions are considered SNPs by this logic
            return evalVC.getReference().length() == allele.length();
        case INDEL:
            return (evalVC.getReference().length() != allele.length()) || allele.isSymbolic();
        case BOTH:
            return true;
        default:
            throw new IllegalStateException( "Encountered unknown recal mode: " + mode );
    }
}
 
Example 7
Source File: RealignmentEngine.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
public static boolean supportsVariant(final GATKRead read, final VariantContext vc, int indelStartTolerance) {
    final byte[] readBases = read.getBasesNoCopy();

    final Pair<Integer, CigarOperator> offsetAndOperatorInRead = ReadUtils.getReadIndexForReferenceCoordinate(read, vc.getStart());

    if ( offsetAndOperatorInRead.getLeft() == ReadUtils.READ_INDEX_NOT_FOUND) {
        return false;
    } else if (vc.isSNP() && offsetAndOperatorInRead.getRight() == CigarOperator.DELETION) {
        return false;
    }

    final int variantPositionInRead = offsetAndOperatorInRead.getLeft();

    for (final Allele allele : vc.getAlternateAlleles()) {
        final int referenceLength = vc.getReference().length();
        if (allele.length() == referenceLength) {   // SNP or MNP -- check whether read bases match variant allele bases
            if (allele.basesMatch(ArrayUtils.subarray(readBases, variantPositionInRead, Math.min(variantPositionInRead + allele.length(), readBases.length)))) {
                return true;
            }
        } else {    // indel -- check if the read has the right CIGAR operator near position -- we don't require exact an exact match because indel representation is non-unique
            final boolean isDeletion = allele.length() < referenceLength;
            int readPosition = 0;    // offset by 1 because the variant position is one base before the first indel base
            for (final CigarElement cigarElement : read.getCigarElements()) {
                if (Math.abs(readPosition - variantPositionInRead) <= indelStartTolerance) {
                    if (isDeletion && mightSupportDeletion(cigarElement) || !isDeletion && mightSupportInsertion(cigarElement)) {
                        return true;
                    }
                } else {
                    readPosition += cigarElement.getLength();
                }
            }
        }
    }
    return false;
}
 
Example 8
Source File: IndelLengthHistogram.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
@Override
public void update1(final VariantContext eval, final ReferenceContext referenceContext, final ReadsContext readsContext, final FeatureContext featureContext) {
    if ( eval.isIndel() && ! eval.isComplexIndel() ) {
        if ( ! ( getWalker().ignoreAC0Sites() && eval.isMonomorphicInSamples() )) {
            // only if we are actually polymorphic in the subsetted samples should we count the allele
            for ( Allele alt : eval.getAlternateAlleles() ) {
                final int alleleSize = alt.length() - eval.getReference().length();
                if ( alleleSize == 0 ) throw new GATKException("Allele size not expected to be zero for indel: alt = " + alt + " ref = " + eval.getReference());
                updateLengthHistogram(eval.getReference(), alt);
            }
        }
    }
}
 
Example 9
Source File: IndelLengthHistogram.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Update the histogram with the implied length of the indel allele between ref and alt (alt.len - ref.len).
 *
 * If this size is outside of MAX_SIZE_FOR_HISTOGRAM, the size is capped to MAX_SIZE_FOR_HISTOGRAM,
 * if INCLUDE_LONG_EVENTS_AT_MAX_SIZE is set.
 *
 * @param ref
 * @param alt
 */
public void updateLengthHistogram(final Allele ref, final Allele alt) {
    int len = alt.length() - ref.length();
    if ( INCLUDE_LONG_EVENTS_AT_MAX_SIZE ) {
        if ( len > MAX_SIZE_FOR_HISTOGRAM ) len = MAX_SIZE_FOR_HISTOGRAM;
        if ( len < -MAX_SIZE_FOR_HISTOGRAM ) len = -MAX_SIZE_FOR_HISTOGRAM;
    }
    
    if ( Math.abs(len) > MAX_SIZE_FOR_HISTOGRAM )
        return;
    
    nIndels++;
    counts.put(len, counts.get(len) + 1);
}
 
Example 10
Source File: GencodeFuncotationFactory.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Shifts a given genomic locus by the number of inserted bases in the given {@code variant}/{@code altAllele} pair
 * if the position occurs after the start of the insertion (and if the variant is an insertion).
 * Assumes that the position and the variant are on the same contig.
 * @param genomicLocus The locus to be potentially adjusted.
 * @param variant The {@link VariantContext} to use for the reference allele.
 * @param altAllele The alternate {@link Allele} to use to compare against the reference allele.
 * @param changedBasesInterval The interval representing the bases actually changed in the given {@code variant} and {@code altAllele} pair.
 * @return A genomic locus adjusted for the bases inserted before it, if any.
 */
private static int adjustLocusForInsertion(final int genomicLocus, final VariantContext variant,
                                           final Allele altAllele, final SimpleInterval changedBasesInterval) {

    int adjustedPosition = genomicLocus;

    // Is our variant / alt pair an insertion AND Is our position after the changed bases have started:
    if ( (altAllele.length() > variant.getReference().length()) && (genomicLocus > changedBasesInterval.getStart()) ) {
        // Adjust our position by the number of inserted bases:
        adjustedPosition += (changedBasesInterval.getEnd() - changedBasesInterval.getStart() + 1);
    }

    return adjustedPosition;

}
 
Example 11
Source File: GencodeFuncotationFactory.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
private static SimpleInterval getBasesChangedIntervalIgnoringLeadingVcfContextBase(final VariantContext variant,
                                                                                   final Allele altAllele) {

    // Adjust the variant interval for the overlap check, specifically to properly test for the indel cases:
    final SimpleInterval changedBasesInterval;
    if ( GATKVariantContextUtils.typeOfVariant(variant.getReference(), altAllele).equals(VariantContext.Type.INDEL) ) {

        final int adjustedStart;
        final int end;
        // Insertion:
        if ( variant.getReference().length() < altAllele.length() ) {
            // We use the variant start here because by inserting bases we're not making the actually changed
            // bases any closer to the boundaries of the exon.
            // That is, the inserted bases shouldn't count towards the extents of the variant in genomic space
            // with respect to the exon boundaries.
            adjustedStart = FuncotatorUtils.getIndelAdjustedAlleleChangeStartPosition(variant, altAllele);
            end = variant.getEnd() + (altAllele.length() - (adjustedStart - variant.getStart()));
        }
        // Deletion:
        else {
            // Because we're deleting bases from the reference allele, we need to adjust the start position of the
            // variant to reflect that the leading base(s) do(es) not matter.
            adjustedStart = FuncotatorUtils.getIndelAdjustedAlleleChangeStartPosition(variant, altAllele);

            // The original end position should be correct:
            end = variant.getEnd();
        }

        changedBasesInterval = new SimpleInterval( variant.getContig(), adjustedStart, end );
    }
    else {
        changedBasesInterval = new SimpleInterval(variant);
    }

    return changedBasesInterval;
}
 
Example 12
Source File: GtcToVcf.java    From picard with MIT License 4 votes vote down vote up
private VariantContext makeVariantContext(Build37ExtendedIlluminaManifestRecord record, final InfiniumGTCRecord gtcRecord,
                                          final InfiniumEGTFile egtFile, final ProgressLogger progressLogger) {
    // If the record is not flagged as errant in the manifest we include it in the VCF
    Allele A = record.getAlleleA();
    Allele B = record.getAlleleB();
    Allele ref = record.getRefAllele();

    final String chr = record.getB37Chr();
    final Integer position = record.getB37Pos();
    final Integer endPosition = position + ref.length() - 1;
    Integer egtIndex = egtFile.rsNameToIndex.get(record.getName());
    if (egtIndex == null) {
        throw new PicardException("Found no record in cluster file for manifest entry '" + record.getName() + "'");
    }

    progressLogger.record(chr, position);

    // Create list of unique alleles
    final List<Allele> assayAlleles = new ArrayList<>();
    assayAlleles.add(ref);

    if (A.equals(B)) {
        throw new PicardException("Found same allele (" + A.getDisplayString() + ") for A and B ");
    }

    if (!ref.equals(A, true)) {
        assayAlleles.add(A);
    }

    if (!ref.equals(B, true)) {
        assayAlleles.add(B);
    }

    final String sampleName = FilenameUtils.removeExtension(INPUT.getName());
    final Genotype genotype = getGenotype(sampleName, gtcRecord, record, A, B);

    final VariantContextBuilder builder = new VariantContextBuilder();

    builder.source(record.getName());
    builder.chr(chr);
    builder.start(position);
    builder.stop(endPosition);
    builder.alleles(assayAlleles);
    builder.log10PError(VariantContext.NO_LOG10_PERROR);
    builder.id(record.getName());
    builder.genotypes(genotype);

    VariantContextUtils.calculateChromosomeCounts(builder, false);

    //custom info fields
    builder.attribute(InfiniumVcfFields.ALLELE_A, record.getAlleleA());
    builder.attribute(InfiniumVcfFields.ALLELE_B, record.getAlleleB());
    builder.attribute(InfiniumVcfFields.ILLUMINA_STRAND, record.getIlmnStrand());
    builder.attribute(InfiniumVcfFields.PROBE_A, record.getAlleleAProbeSeq());
    builder.attribute(InfiniumVcfFields.PROBE_B, record.getAlleleBProbeSeq());
    builder.attribute(InfiniumVcfFields.BEADSET_ID, record.getBeadSetId());
    builder.attribute(InfiniumVcfFields.ILLUMINA_CHR, record.getChr());
    builder.attribute(InfiniumVcfFields.ILLUMINA_POS, record.getPosition());
    builder.attribute(InfiniumVcfFields.ILLUMINA_BUILD, record.getGenomeBuild());
    builder.attribute(InfiniumVcfFields.SOURCE, record.getSource().replace(' ', '_'));
    builder.attribute(InfiniumVcfFields.GC_SCORE, formatFloatForVcf(egtFile.totalScore[egtIndex]));

    for (InfiniumVcfFields.GENOTYPE_VALUES gtValue : InfiniumVcfFields.GENOTYPE_VALUES.values()) {
        final int ordinalValue = gtValue.ordinal();
        builder.attribute(InfiniumVcfFields.N[ordinalValue], egtFile.n[egtIndex][ordinalValue]);
        builder.attribute(InfiniumVcfFields.DEV_R[ordinalValue], formatFloatForVcf(egtFile.devR[egtIndex][ordinalValue]));
        builder.attribute(InfiniumVcfFields.MEAN_R[ordinalValue], formatFloatForVcf(egtFile.meanR[egtIndex][ordinalValue]));
        builder.attribute(InfiniumVcfFields.DEV_THETA[ordinalValue], formatFloatForVcf(egtFile.devTheta[egtIndex][ordinalValue]));
        builder.attribute(InfiniumVcfFields.MEAN_THETA[ordinalValue], formatFloatForVcf(egtFile.meanTheta[egtIndex][ordinalValue]));

        final EuclideanValues genotypeEuclideanValues = polarToEuclidean(egtFile.meanR[egtIndex][ordinalValue], egtFile.devR[egtIndex][ordinalValue],
                egtFile.meanTheta[egtIndex][ordinalValue], egtFile.devTheta[egtIndex][ordinalValue]);
        builder.attribute(InfiniumVcfFields.DEV_X[ordinalValue], formatFloatForVcf(genotypeEuclideanValues.devX));
        builder.attribute(InfiniumVcfFields.MEAN_X[ordinalValue], formatFloatForVcf(genotypeEuclideanValues.meanX));
        builder.attribute(InfiniumVcfFields.DEV_Y[ordinalValue], formatFloatForVcf(genotypeEuclideanValues.devY));
        builder.attribute(InfiniumVcfFields.MEAN_Y[ordinalValue], formatFloatForVcf(genotypeEuclideanValues.meanY));
    }


    final String rsid = record.getRsId();
    if (StringUtils.isNotEmpty(rsid)) {
        builder.attribute(InfiniumVcfFields.RS_ID, rsid);
    }
    if (egtFile.totalScore[egtIndex] == 0.0) {
        builder.filter(InfiniumVcfFields.ZEROED_OUT_ASSAY);
        if (genotype.isCalled()) {
            if (DO_NOT_ALLOW_CALLS_ON_ZEROED_OUT_ASSAYS) {
                throw new PicardException("Found a call (genotype: " + genotype + ") on a zeroed out Assay!!");
            } else {
                log.warn("Found a call (genotype: " + genotype + ") on a zeroed out Assay. " +
                        "This could occur if you called genotypes on a different cluster file than used here.");
            }
        }
    }
    if (record.isDupe()) {
        builder.filter(InfiniumVcfFields.DUPE);
    }
    return builder.make();
}
 
Example 13
Source File: ValidateVariants.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@Override
public void apply(final VariantContext vc, final ReadsContext readsContext, final ReferenceContext ref, final FeatureContext featureContext) {
    if (DO_NOT_VALIDATE_FILTERED && vc.isFiltered()) {
        return;
    }
    // get the true reference allele
    final Allele reportedRefAllele = vc.getReference();
    final int refLength = reportedRefAllele.length();

    final Allele observedRefAllele = hasReference() ? Allele.create(Arrays.copyOf(ref.getBases(), refLength)) : null;

    final Set<String> rsIDs = getRSIDs(featureContext);

    if (VALIDATE_GVCF) {
        final SimpleInterval refInterval = ref.getInterval();

        validateVariantsOrder(vc);

        // GenomeLocSortedSet will automatically merge intervals that are overlapping when setting `mergeIfIntervalOverlaps`
        // to true.  In a GVCF most blocks are adjacent to each other so they wouldn't normally get merged.  We check
        // if the current record is adjacent to the previous record and "overlap" them if they are so our set is as
        // small as possible while still containing the same bases.
        final int start = (previousInterval != null && previousInterval.overlapsWithMargin(refInterval, 1)) ?
                previousInterval.getStart() : refInterval.getStart();
        final int end = (previousInterval != null && previousInterval.overlapsWithMargin(refInterval, 1)) ?
                Math.max(previousInterval.getEnd(), vc.getEnd()) : vc.getEnd();
        final GenomeLoc possiblyMergedGenomeLoc = genomeLocSortedSet.getGenomeLocParser().createGenomeLoc(refInterval.getContig(), start, end);
        genomeLocSortedSet.add(possiblyMergedGenomeLoc, true);

        previousInterval = new SimpleInterval(possiblyMergedGenomeLoc);
        previousStart = vc.getStart();
        validateGVCFVariant(vc);
    }

    for (final ValidationType t : validationTypes) {
        try{
            applyValidationType(vc, reportedRefAllele, observedRefAllele, rsIDs, t);
        } catch (TribbleException e) {
            throwOrWarn(new UserException.FailsStrictValidation(drivingVariantFile, t, e.getMessage()));
        }
    }
}
 
Example 14
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Gets the coding sequence for the allele with given start and stop positions, codon-aligned to the start of the reference sequence.
 * @param codingSequence The whole coding sequence for this transcript.  Must not be {@code null}.
 * @param alignedAlleleStart The codon-aligned position (1-based, inclusive) of the allele start.  Must not be {@code null}.
 * @param alignedAlleleStop The codon-aligned position (1-based, inclusive) of the allele stop.  Must not be {@code null}.
 * @param refAllele The reference {@link Allele}.  Must not be {@code null}.
 * @param refAlleleStart The position (1-based, inclusive) of where the reference allele starts.  Must not be {@code null}.  Must be > 0.
 * @param strand The {@link Strand} on which the alleles are found.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @return A {@link String} containing the reference allele coding sequence.
 */
public static String getAlignedCodingSequenceAllele(final String codingSequence,
                                                    final Integer alignedAlleleStart,
                                                    final Integer alignedAlleleStop,
                                                    final Allele  refAllele,
                                                    final Integer refAlleleStart,
                                                    final Strand strand) {
    Utils.nonNull(codingSequence);
    Utils.nonNull(alignedAlleleStart);
    ParamUtils.isPositive( alignedAlleleStart, "Genome positions must be > 0." );
    Utils.nonNull(alignedAlleleStop);
    ParamUtils.isPositive( alignedAlleleStop, "Genome positions must be > 0." );
    Utils.nonNull(refAllele);
    Utils.nonNull(refAlleleStart);
    ParamUtils.isPositive( refAlleleStart, "Genome positions must be > 0." );

    assertValidStrand( strand );

    String alignedAlleleSeq = getAlignedAlleleSequence(codingSequence, alignedAlleleStart, alignedAlleleStop, strand);

    // Check whether our reference sequence is derived from the reference or if it should be derived from the given
    // reference.
    final String expectedReferenceSequence;
    if ( strand == Strand.POSITIVE ) {
        expectedReferenceSequence = codingSequence.substring(refAlleleStart - 1, refAlleleStart - 1 + refAllele.length());
    }
    else {
        final int start = codingSequence.length() - (refAlleleStart - 1 + refAllele.length());
        final int end = codingSequence.length() - refAlleleStart;
        expectedReferenceSequence = ReadUtils.getBasesReverseComplement( codingSequence.substring(start, end).getBytes() );
    }

    // NOTE: This check appears to be redundant, but in actuality, it is required.
    //       Because we reconstruct the coding sequence allele separately from the reference allele, we need to check this
    //       again to make sure we have the right alleles given our input.
    if ( !expectedReferenceSequence.equals(refAllele.getBaseString()) ) {
        // Oh noes!
        // Ref allele is different from reference sequence!
        // Oh well, we should use the reference we were given anyways...
        final String substitutedAlignedSeq = getAlternateSequence(new StrandCorrectedReferenceBases(codingSequence, strand), refAlleleStart, refAllele, refAllele, strand);

        // We use the positive strand here because we have already reverse complemented the sequence in the call
        // above.
        final String substitutedAlignedAlleleSeq = getAlignedAlleleSequence(substitutedAlignedSeq, alignedAlleleStart, alignedAlleleStop, Strand.POSITIVE);

        // Warn the user!
        logger.warn("Reference allele differs from reference coding sequence (strand: " + strand + ") (Allele " + expectedReferenceSequence + " != " + refAllele.getBaseString() + " coding sequence)!  Substituting given allele for sequence code (" + alignedAlleleSeq + "->" + substitutedAlignedAlleleSeq + ")");

        // Set up our return value:
        alignedAlleleSeq = substitutedAlignedAlleleSeq;
    }

    return alignedAlleleSeq;
}
 
Example 15
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the aligned coding sequence for the given reference allele.
 * NOTE: alignedRefAlleleStart must be less than or equal to codingSequenceRefAlleleStart.
 * @param referenceSnippet {@link StrandCorrectedReferenceBases} containing a short excerpt of the reference sequence around the given reference allele.  Must not be {@code null}.
 * @param referencePadding Number of bases in {@code referenceSnippet} before the reference allele starts.  This padding exists at the end as well (plus some other bases to account for the length of the alternate allele if it is longer than the reference).  Must be >= 0.
 * @param refAllele The reference {@link Allele}.  Must not be {@code null}.
 * @param altAllele The alternate {@link Allele}.  Must not be {@code null}.
 * @param codingSequenceRefAlleleStart The position (1-based, inclusive) in the coding sequence where the {@code refAllele} starts.
 * @param alignedRefAlleleStart The in-frame position (1-based, inclusive) of the first base of the codon containing the reference allele.
 * @param strand The {@link Strand} on which the variant occurs.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @param variantGenomicPositionForLogging The {@link Locatable} representing the position of the variant in genomic coordinates.  Used for logging purposes only.
 * @return A {@link String} of in-frame codons that contain the entire reference allele.
 */
public static String getAlignedRefAllele(final StrandCorrectedReferenceBases referenceSnippet,
                                         final int referencePadding,
                                         final Allele refAllele,
                                         final Allele altAllele,
                                         final int codingSequenceRefAlleleStart,
                                         final int alignedRefAlleleStart,
                                         final Strand strand,
                                         final Locatable variantGenomicPositionForLogging) {

    Utils.nonNull(referenceSnippet);
    Utils.nonNull(refAllele);
    Utils.nonNull(altAllele);
    ParamUtils.isPositiveOrZero( referencePadding, "Padding must be >= 0." );
    Utils.nonNull(variantGenomicPositionForLogging);
    assertValidStrand(strand);
    Utils.validate( alignedRefAlleleStart <= codingSequenceRefAlleleStart, "The alignedRefAlleleStart must be less than or equal to codingSequenceRefAlleleStart!" );

    // Get the length of the variant.
    // Since our referenceSnippet is <referencePadding> bases on either side of the variant
    // we need to know how long the variant is (so we can adjust for - strand variants)
    final int referenceLength = refAllele.length();

    // Compute the number of bases that we must prepend to this one to get to the start of a codon:
    final int extraBasesNeededForCodonAlignment = (codingSequenceRefAlleleStart - alignedRefAlleleStart);

    // We must add bases to the start to adjust for indels because of the required preceding base in VCF format:
    final int indelAdjustment = GATKVariantContextUtils.isIndel(refAllele, altAllele) ? 1 : 0;

    // We need to adjust coordinates for deletions on the - strand in the reference snippet:
    final int negativeDeletionAdjustment = (altAllele.length() < refAllele.length()) && (strand == Strand.NEGATIVE) ? 1 : 0;

    // Get the index in our reference snippet of the aligned codon start:
    int refStartAlignedIndex;
    if ( strand == Strand.NEGATIVE ) {
        // Variants on the - strand have to be handled as a special case because of how
        // the referenceSnippet is constructed:
        refStartAlignedIndex = referencePadding - extraBasesNeededForCodonAlignment;
    }
    else {
        refStartAlignedIndex = referencePadding - extraBasesNeededForCodonAlignment - indelAdjustment;
    }

    // TODO: This should probably be an error condition:
    if ( refStartAlignedIndex < 0 ) {
        refStartAlignedIndex = 0;
    }

    // Round to the nearest multiple of AminoAcid.CODON_LENGTH to get the end position.
    // Note this is the position of the first base NOT in the REF allele
    // (because it's used for substring coordinates).
    int refEndPosExclusive = refStartAlignedIndex + (int)(Math.ceil((extraBasesNeededForCodonAlignment + refAllele.length()) / ((double)AminoAcid.CODON_LENGTH)) * AminoAcid.CODON_LENGTH);

    // Create the aligned reference:
    String alignedReferenceAllele = referenceSnippet.getBaseString().substring(refStartAlignedIndex, refEndPosExclusive);

    final String computedReferenceAlleleWithSubstitution = alignedReferenceAllele.substring(extraBasesNeededForCodonAlignment, extraBasesNeededForCodonAlignment + refAllele.length());

    // Make sure our reference is what we expect it to be with the ref allele:
    if ( !computedReferenceAlleleWithSubstitution.equals(refAllele.getBaseString()) ) {
        // Oh noes!
        // Ref allele is different from reference sequence!

        // Oh well, we should use the reference we were given anyways...
        final String substitutedReferenceSnippet = getAlternateSequence(referenceSnippet, referencePadding + 1, refAllele, refAllele, strand);
        refEndPosExclusive = refStartAlignedIndex + (int)(Math.ceil((extraBasesNeededForCodonAlignment + refAllele.length()) / ((double)AminoAcid.CODON_LENGTH)) * AminoAcid.CODON_LENGTH);

        final String substitutedAlignedAlleleSeq = substitutedReferenceSnippet.substring(refStartAlignedIndex, refEndPosExclusive);

        // Warn the user!
        final String positionString = '[' + variantGenomicPositionForLogging.getContig() + ":" + variantGenomicPositionForLogging.getStart() + ']';
        logger.warn("Reference allele is different than the reference coding sequence (strand: " + strand + ", alt = " + altAllele.getBaseString() + ", ref " + refAllele.getBaseString() + " != " + computedReferenceAlleleWithSubstitution + " reference coding seq) @" + positionString + "!  Substituting given allele for sequence code (" + alignedReferenceAllele + "->" + substitutedAlignedAlleleSeq + ")");

        // Set up our return value:
        alignedReferenceAllele = substitutedAlignedAlleleSeq;
    }

    return alignedReferenceAllele;
}
 
Example 16
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the overlapping exon start/stop as a {@link SimpleInterval} for the given altAllele / reference.
 * @param refAllele {@link Allele} for the given {@code altAllele}.  Must not be {@code null}.
 * @param altAllele {@link Allele} to locate on an exon.  Must not be {@code null}.
 * @param contig Contig on which the altAllele occurs.  Must not be {@code null}.
 * @param variantStart Start position (1-based, inclusive) of the given {@code altAllele}.  Must be > 0.
 * @param variantEnd End position (1-based, inclusive) of the given {@code altAllele}.  Must be > 0.
 * @param exonPositionList List of exon start / stop positions to cross-reference with the given {@code altAllele}.  Must not be {@code null}.
 * @param strand The {@link Strand} on which the {@code altAllele} is located.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @return A {@link SimpleInterval} containing the extents of the exon that overlaps the given altAllele.  {@code null} if no overlap occurs.
 */
public static SimpleInterval getOverlappingExonPositions(final Allele refAllele,
                                                         final Allele altAllele,
                                                         final String contig,
                                                         final int variantStart,
                                                         final int variantEnd,
                                                         final Strand strand,
                                                         final List<? extends htsjdk.samtools.util.Locatable> exonPositionList) {
    Utils.nonNull( refAllele );
    Utils.nonNull( altAllele );
    Utils.nonNull( contig );
    Utils.nonNull( exonPositionList );
    assertValidStrand( strand );

    ParamUtils.isPositive( variantStart, "Genome positions must be > 0." );
    ParamUtils.isPositive( variantEnd, "Genome positions must be > 0." );

    // Get the correct start / end positions:
    int alleleStart = variantStart;
    int alleleEnd   = variantEnd;

    if ( refAllele.length() > refAllele.length() ) {
        final int lengthAdjustment = (refAllele.length() - altAllele.length());
        if ( strand == Strand.NEGATIVE ) {
            alleleStart -= lengthAdjustment;
        }
        else {
            alleleEnd   += lengthAdjustment;
        }
    }

    // Create an interval so we can check for overlap:
    final SimpleInterval variantInterval = new SimpleInterval( contig, alleleStart, alleleEnd );

    int exonStart = -1;
    int exonStop  = -1;

    // Check for overlap:
    for ( final Locatable exon : exonPositionList ) {
        if ( variantInterval.overlaps(exon) ) {
            exonStart = exon.getStart();
            exonStop  = exon.getEnd();
        }
    }

    // Since we set the start/stop together we only need to check one of these:
    if ( exonStart == -1 ) {
        return null;
    }
    else {
        return new SimpleInterval( contig, exonStart, exonStop );
    }
}