Java Code Examples for htsjdk.samtools.util.Locatable#getContig()

The following examples show how to use htsjdk.samtools.util.Locatable#getContig() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: FeatureDataSource.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Refill our cache from disk after a cache miss. Will prefetch Features overlapping an additional
 * queryLookaheadBases bases after the end of the provided interval, in addition to those overlapping
 * the interval itself.
 * <p>
 * Calling this has the side effect of invalidating (closing) any currently-open iteration over
 * this data source.
 *
 * @param interval the query interval that produced a cache miss
 */
private void refillQueryCache(final Locatable interval) {
    // Tribble documentation states that having multiple iterators open simultaneously over the same FeatureReader
    // results in undefined behavior
    closeOpenIterationIfNecessary();

    // Expand the end of our query by the configured number of bases, in anticipation of probable future
    // queries with slightly larger start/stop positions.
    //
    // Note that it doesn't matter if we go off the end of the contig in the process, since
    // our reader's query operation is not aware of (and does not care about) contig boundaries.
    // Note: we use addExact to blow up on overflow rather than propagate negative results downstream
    final SimpleInterval queryInterval = new SimpleInterval(interval.getContig(), interval.getStart(), Math.addExact(interval.getEnd(), queryLookaheadBases));

    // Query iterator over our reader will be immediately closed after re-populating our cache
    try (final CloseableTribbleIterator<T> queryIter = featureReader.query(queryInterval.getContig(), queryInterval.getStart(), queryInterval.getEnd())) {
        queryCache.fill(queryIter, queryInterval);
    } catch (final IOException e) {
        throw new GATKException("Error querying file " + featureInput + " over interval " + interval, e);
    }
}
 
Example 2
Source File: IntervalUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
@Test(dataProvider = "lexicographicalOrderComparatorData")
public void testLexicographicalOrderComparator(final List<Locatable> unsorted) {
    final List<Locatable> sorted = unsorted.stream().sorted(IntervalUtils.LEXICOGRAPHICAL_ORDER_COMPARATOR)
            .collect(Collectors.toList());
    for (int i = 0; i < sorted.size() - 1; i++) {
       final Locatable thisLoc = sorted.get(i);
       final Locatable nextLoc = sorted.get(i + 1);
       if (thisLoc.getContig() == nextLoc.getContig()) {
           Assert.assertTrue(thisLoc.getStart() <= nextLoc.getStart());
           if (thisLoc.getStart() == nextLoc.getStart()) {
               Assert.assertTrue(thisLoc.getEnd() <= nextLoc.getEnd());
           }
       } else if (thisLoc.getContig() == null) {
           Assert.fail("Null contig must go last");
       } else if (nextLoc.getContig() != null) {
           Assert.assertTrue(thisLoc.getContig().compareTo(nextLoc.getContig()) <= 0);
           if (thisLoc.getContig().equals(nextLoc.getContig())) {
               Assert.assertTrue(thisLoc.getStart() <= nextLoc.getStart());
               if (thisLoc.getStart() == nextLoc.getStart()) {
                   Assert.assertTrue(thisLoc.getEnd() <= nextLoc.getEnd());
               }
           }
       }
    }
}
 
Example 3
Source File: SegmentUtils.java    From gatk-protected with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Given an interval and collections of targets and SNPs, returns a trimmed interval produced by removing the empty
 * portions at the start and the end of the original interval that do not overlap the targets and SNPs that overlap
 * with the original interval.  If this procedure would remove the entire interval, the original interval is
 * returned instead.  Note that this method will not expand an interval to the start of the first overlapping target
 * and the end of the last overlapping target; it will only shrink the interval or leave it alone.  This is to
 * avoid overlapping segments (which would occur if a SNP breakpoint fell within a target and the interval
 * were expanded, for example).
 */
public static SimpleInterval trimInterval(final Locatable interval,
                                          final TargetCollection<? extends Locatable> targets,
                                          final TargetCollection<? extends Locatable> snps) {
    Utils.nonNull(interval, "The interval cannot be null.");
    Utils.nonNull(targets, "The collection of targets cannot be null.");
    Utils.nonNull(snps, "The collection of SNPs cannot be null.");

    final IndexRange targetRange = targets.indexRange(interval);
    final IndexRange snpRange = snps.indexRange(interval);

    final int numTargetsInInterval = targetRange.size();
    final int numSNPsInInterval = snpRange.size();

    int start = interval.getStart();
    int end = interval.getEnd();

    if (numTargetsInInterval == 0 && numSNPsInInterval > 0) {
        //if there are no targets overlapping interval, use SNPs to determine trimmed interval
        start = snps.target(snpRange.from).getStart();
        end = snps.target(snpRange.to - 1).getEnd();
    } else if (numTargetsInInterval > 0) {
        //if interval start does not fall within first target, use start of first target as start of trimmed interval
        start = Math.max(start, targets.target(targetRange.from).getStart());
        //if interval end does not fall within last target, use end of last target as end of trimmed interval
        end = Math.min(end, targets.target(targetRange.to - 1).getEnd());

        if (numSNPsInInterval > 0) {
            //if there are also SNPs within interval, check to see if they give a larger trimmed interval
            start = Math.min(start, snps.target(snpRange.from).getStart());
            end = Math.max(end, snps.target(snpRange.to - 1).getEnd());
        }
    }
    if (start < interval.getStart() || end > interval.getEnd() || end < start) {
        throw new GATKException.ShouldNeverReachHereException("Something went wrong in trimming interval.");
    }
    return new SimpleInterval(interval.getContig(), start, end);
}
 
Example 4
Source File: VariantShard.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * getVariantShardsFromInterval calculates *all* shards overlapping location.
 * @param location the range of sites determines which shards are overlapping
 * @return All overlapping VariantShards
 */
public static List<VariantShard> getVariantShardsFromInterval(final Locatable location) {
    if (location.getContig()==null) {
        // don't feed me unmapped reads!
        throw new GATKException("getVariantShardsFromInterval requires locations to be mapped");
    }
    List<VariantShard> shardList = new ArrayList<>();
    // Get all of the shard numbers that span the start and end of the interval.
    int startShard = location.getStart()/ VARIANT_SHARDSIZE;
    int endShard = location.getEnd()/ VARIANT_SHARDSIZE;
    for (int i = startShard; i <= endShard; ++i) {
        shardList.add(new VariantShard(i, location.getContig()));
    }
    return shardList;
}
 
Example 5
Source File: SimpleInterval.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Determines whether this interval comes within "margin" of overlapping the provided locatable.
 * This is the same as plain overlaps if margin=0.
 *
 * @param other interval to check
 * @param margin how many bases may be between the two interval for us to still consider them overlapping; must be non-negative
 * @return true if this interval overlaps other, otherwise false
 * @throws IllegalArgumentException if margin is negative
 */
public boolean overlapsWithMargin(final Locatable other, final int margin) {
    if ( margin < 0 ) {
        throw new IllegalArgumentException("given margin is negative: " + margin +
                "\tfor this: " + toString() + "\tand that: " + (other == null ? "other is null" : other.toString()));
    }
    if ( other == null || other.getContig() == null ) {
        return false;
    }

    return this.contig.equals(other.getContig()) && this.start <= other.getEnd() + margin && other.getStart() - margin <= this.end;
}
 
Example 6
Source File: SimpleInterval.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Determines whether this interval contains the entire region represented by other
 * (in other words, whether it covers it).
 *
 * @param other interval to check
 * @return true if this interval contains all of the bases spanned by other, otherwise false
 */
public boolean contains( final Locatable other ) {
    if ( other == null || other.getContig() == null ) {
        return false;
    }

    return this.contig.equals(other.getContig()) && this.start <= other.getStart() && this.end >= other.getEnd();
}
 
Example 7
Source File: SegmentExonUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * @param transcript Never {@code null}
 * @param segment Genomic region to determine which exons are covered in the transcript.  Never {@code null}
 * @return Instance of {@link SegmentExonOverlaps}.  A typical value will be a number appended to a "-" or a "+".
 *   The exon numbers are 0-based.  "-" if it covers upstream exons (considering coding direction -- i.e. previous exons
 *   are higher genomic coordinates on negative strand transcripts) and "+" for downstream exons.
 *   For example:
 *   - "6+" would be sixth exon and above.  This would indicate downstream exons (higher genomic
 *     coordinates on positive coding directions and lower genomic coordinates when negative coding) are covered by
 *     the segment.
 *   - "" indicates that the endpoints of the segment do not cover any transcript exons.  So either the entire
 *      transcript is covered or none of the transcript is covered.
 *
 *   Note that this will return the first exon covered, as long as the breakpoint is in the transcript.  Even if the
 *    breakpoint itself does not cover an exon.
 *    For example, this will yield a non-blank value when a segment breakpoint is in an intron.
 */
public static SegmentExonOverlaps determineSegmentExonPosition(final GencodeGtfTranscriptFeature transcript, final Locatable segment) {

    Utils.nonNull(transcript);
    Utils.nonNull(segment);

    final String NOT_IN_TRANSCRIPT = "";

    // Internally, this method assumes that the exons are sorted in genomic order.  Which is NOT how these are
    //  stored in the GencodeGtfTranscriptFeature
    final List<GencodeGtfExonFeature> exons = transcript.getGenomicStrand() == Strand.NEGATIVE ?
            Lists.reverse(transcript.getExons()) : transcript.getExons();

    if (exons.size() == 0) {
        return new SegmentExonOverlaps(NOT_IN_TRANSCRIPT, NOT_IN_TRANSCRIPT);
    }

    final SimpleInterval segmentStart = new SimpleInterval(segment.getContig(), segment.getStart(), segment.getStart());
    final SimpleInterval segmentEnd = new SimpleInterval(segment.getContig(), segment.getEnd(), segment.getEnd());

    // Find the proper index for the start.
    // If the start of the segment does not overlap the first exon, but the segment does, then the start index is -1 (no overlap)
    final int inclusiveIndexPositiveDirectionStart = findInclusiveExonIndex(transcript.getGenomicStrand(), exons, segmentStart, true);

    // If the end of the segment does not overlap the last exon, but the segment does, then the end index is -1 (no overlap)
    final int inclusiveIndexPositiveDirectionEnd = findInclusiveExonIndex(transcript.getGenomicStrand(), exons, segmentEnd, false);

    // Construct the final strings
    final String startResult = inclusiveIndexPositiveDirectionStart != NO_EXON_OVERLAP ?
            inclusiveIndexPositiveDirectionStart + determineSegmentOverlapDirection(transcript.getGenomicStrand(), true)
            : NOT_IN_TRANSCRIPT;
    final String endResult = inclusiveIndexPositiveDirectionEnd != NO_EXON_OVERLAP ?
            inclusiveIndexPositiveDirectionEnd + determineSegmentOverlapDirection(transcript.getGenomicStrand(), false)
            : NOT_IN_TRANSCRIPT;

    return new SegmentExonOverlaps(startResult, endResult);
}
 
Example 8
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the aligned coding sequence for the given reference allele.
 * NOTE: alignedRefAlleleStart must be less than or equal to codingSequenceRefAlleleStart.
 * @param referenceSnippet {@link StrandCorrectedReferenceBases} containing a short excerpt of the reference sequence around the given reference allele.  Must not be {@code null}.
 * @param referencePadding Number of bases in {@code referenceSnippet} before the reference allele starts.  This padding exists at the end as well (plus some other bases to account for the length of the alternate allele if it is longer than the reference).  Must be >= 0.
 * @param refAllele The reference {@link Allele}.  Must not be {@code null}.
 * @param altAllele The alternate {@link Allele}.  Must not be {@code null}.
 * @param codingSequenceRefAlleleStart The position (1-based, inclusive) in the coding sequence where the {@code refAllele} starts.
 * @param alignedRefAlleleStart The in-frame position (1-based, inclusive) of the first base of the codon containing the reference allele.
 * @param strand The {@link Strand} on which the variant occurs.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @param variantGenomicPositionForLogging The {@link Locatable} representing the position of the variant in genomic coordinates.  Used for logging purposes only.
 * @return A {@link String} of in-frame codons that contain the entire reference allele.
 */
public static String getAlignedRefAllele(final StrandCorrectedReferenceBases referenceSnippet,
                                         final int referencePadding,
                                         final Allele refAllele,
                                         final Allele altAllele,
                                         final int codingSequenceRefAlleleStart,
                                         final int alignedRefAlleleStart,
                                         final Strand strand,
                                         final Locatable variantGenomicPositionForLogging) {

    Utils.nonNull(referenceSnippet);
    Utils.nonNull(refAllele);
    Utils.nonNull(altAllele);
    ParamUtils.isPositiveOrZero( referencePadding, "Padding must be >= 0." );
    Utils.nonNull(variantGenomicPositionForLogging);
    assertValidStrand(strand);
    Utils.validate( alignedRefAlleleStart <= codingSequenceRefAlleleStart, "The alignedRefAlleleStart must be less than or equal to codingSequenceRefAlleleStart!" );

    // Get the length of the variant.
    // Since our referenceSnippet is <referencePadding> bases on either side of the variant
    // we need to know how long the variant is (so we can adjust for - strand variants)
    final int referenceLength = refAllele.length();

    // Compute the number of bases that we must prepend to this one to get to the start of a codon:
    final int extraBasesNeededForCodonAlignment = (codingSequenceRefAlleleStart - alignedRefAlleleStart);

    // We must add bases to the start to adjust for indels because of the required preceding base in VCF format:
    final int indelAdjustment = GATKVariantContextUtils.isIndel(refAllele, altAllele) ? 1 : 0;

    // We need to adjust coordinates for deletions on the - strand in the reference snippet:
    final int negativeDeletionAdjustment = (altAllele.length() < refAllele.length()) && (strand == Strand.NEGATIVE) ? 1 : 0;

    // Get the index in our reference snippet of the aligned codon start:
    int refStartAlignedIndex;
    if ( strand == Strand.NEGATIVE ) {
        // Variants on the - strand have to be handled as a special case because of how
        // the referenceSnippet is constructed:
        refStartAlignedIndex = referencePadding - extraBasesNeededForCodonAlignment;
    }
    else {
        refStartAlignedIndex = referencePadding - extraBasesNeededForCodonAlignment - indelAdjustment;
    }

    // TODO: This should probably be an error condition:
    if ( refStartAlignedIndex < 0 ) {
        refStartAlignedIndex = 0;
    }

    // Round to the nearest multiple of AminoAcid.CODON_LENGTH to get the end position.
    // Note this is the position of the first base NOT in the REF allele
    // (because it's used for substring coordinates).
    int refEndPosExclusive = refStartAlignedIndex + (int)(Math.ceil((extraBasesNeededForCodonAlignment + refAllele.length()) / ((double)AminoAcid.CODON_LENGTH)) * AminoAcid.CODON_LENGTH);

    // Create the aligned reference:
    String alignedReferenceAllele = referenceSnippet.getBaseString().substring(refStartAlignedIndex, refEndPosExclusive);

    final String computedReferenceAlleleWithSubstitution = alignedReferenceAllele.substring(extraBasesNeededForCodonAlignment, extraBasesNeededForCodonAlignment + refAllele.length());

    // Make sure our reference is what we expect it to be with the ref allele:
    if ( !computedReferenceAlleleWithSubstitution.equals(refAllele.getBaseString()) ) {
        // Oh noes!
        // Ref allele is different from reference sequence!

        // Oh well, we should use the reference we were given anyways...
        final String substitutedReferenceSnippet = getAlternateSequence(referenceSnippet, referencePadding + 1, refAllele, refAllele, strand);
        refEndPosExclusive = refStartAlignedIndex + (int)(Math.ceil((extraBasesNeededForCodonAlignment + refAllele.length()) / ((double)AminoAcid.CODON_LENGTH)) * AminoAcid.CODON_LENGTH);

        final String substitutedAlignedAlleleSeq = substitutedReferenceSnippet.substring(refStartAlignedIndex, refEndPosExclusive);

        // Warn the user!
        final String positionString = '[' + variantGenomicPositionForLogging.getContig() + ":" + variantGenomicPositionForLogging.getStart() + ']';
        logger.warn("Reference allele is different than the reference coding sequence (strand: " + strand + ", alt = " + altAllele.getBaseString() + ", ref " + refAllele.getBaseString() + " != " + computedReferenceAlleleWithSubstitution + " reference coding seq) @" + positionString + "!  Substituting given allele for sequence code (" + alignedReferenceAllele + "->" + substitutedAlignedAlleleSeq + ")");

        // Set up our return value:
        alignedReferenceAllele = substitutedAlignedAlleleSeq;
    }

    return alignedReferenceAllele;
}
 
Example 9
Source File: IntervalUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 3 votes vote down vote up
/**
 * Check whether two locatables overlap.
 * <p>
 *    Two locatables overlap if the share the same contig and they have at least one
 *    base in common based on their start and end positions.
 * </p>
 * <p>
 *    This method returns {@code false} if either input {@link Locatable} has a {@code null}
 *    contig.
 * </p>
 *
 * @param left first locatable.
 * @param right second locatable.
 * @throws IllegalArgumentException if either {@code left} or {@code right} locatable
 *  is {@code null}.
 * @return {@code true} iff there is an overlap between both locatables.
 */
public static boolean overlaps(final Locatable left, final Locatable right) {
    Utils.nonNull(left,"the left locatable is null");
    Utils.nonNull(right,"the right locatable is null");
    if (left.getContig() == null || right.getContig() == null) {
        return false;
    } else if (!left.getContig().equals(right.getContig())) {
        return false;
    } else {
        return left.getStart() <= right.getEnd() && right.getStart() <= left.getEnd();
    }
}
 
Example 10
Source File: Target.java    From gatk-protected with BSD 3-Clause "New" or "Revised" License 2 votes vote down vote up
/**
 * Creates a string for a locatable that can be used when creating dummy target names
 * @param locatable The genome region to create a unique dummy target name. Never {@code null}
 * @return never {@code null}
 */
private static String createDummyTargetName(final Locatable locatable){
    Utils.nonNull(locatable);
    return "target_" + locatable.getContig() + "_" + String.valueOf(locatable.getStart()) + "_" + String.valueOf(locatable.getEnd());
}
 
Example 11
Source File: ReferenceShard.java    From gatk with BSD 3-Clause "New" or "Revised" License 2 votes vote down vote up
/**
 * getShardNumberFromInterval returns the ReferenceShard that overlap the read's start position.
 * @param location, the start of which is used to determine the shard
 * @return the shard (contig + id)
 */
public static ReferenceShard getShardNumberFromInterval(final Locatable location) {
    return new ReferenceShard(location.getStart()/REFERENCE_SHARD_SIZE, location.getContig());
}