Java Code Examples for htsjdk.samtools.SAMSequenceDictionary#getSequence()

The following examples show how to use htsjdk.samtools.SAMSequenceDictionary#getSequence() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: AnnotateTargetsIntegrationTest.java    From gatk-protected with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
private List<SimpleInterval> createRandomIntervals(final SAMSequenceDictionary referenceDictionary, final int numberOfIntervals, final int minIntervalSize, final int maxIntervalSize, final int meanIntervalSize, final double intervalSizeStdev) {
    final List<SimpleInterval> result = new ArrayList<>(numberOfIntervals);
    final int numberOfSequences = referenceDictionary.getSequences().size();
    for (int i = 0; i < numberOfIntervals; i++) {
        final SAMSequenceRecord contig = referenceDictionary.getSequence(RANDOM.nextInt(numberOfSequences));
        final String contigName = contig.getSequenceName();
        final int intervalSize = Math.min(maxIntervalSize, (int) Math.max(minIntervalSize, Math.round(RANDOM.nextDouble() * intervalSizeStdev + meanIntervalSize)));
        final int intervalStart = 1 + RANDOM.nextInt(contig.getSequenceLength() - intervalSize);
        final int intervalEnd = intervalStart + intervalSize - 1;
        final SimpleInterval interval = new SimpleInterval(contigName, intervalStart, intervalEnd);
        result.add(interval);
    }

    final Comparator<SimpleInterval> comparator =
            Comparator.comparing(SimpleInterval::getContig,
                    (a, b) -> Integer.compare(
                            referenceDictionary.getSequenceIndex(a),
                            referenceDictionary.getSequenceIndex(b)))
            .thenComparingInt(SimpleInterval::getStart)
            .thenComparingInt(SimpleInterval::getEnd);
    Collections.sort(result, comparator);
    return result;
}
 
Example 2
Source File: SamRangeUtils.java    From rtg-tools with BSD 2-Clause "Simplified" License 6 votes vote down vote up
/**
 * Resolves an inital range (supplied by the user, and may have unbounded ends) to the available sequences.
 * If end is greater than number of sequences it sets end to number of sequences.
 * @param range the range
 * @param dictionary the dictionary with which to validate/resolve the range
 * @return the resolved range.
 * @throws NoTalkbackSlimException if the start is out of range.
 */
public static SequenceNameLocus resolveRestriction(SAMSequenceDictionary dictionary, SequenceNameLocus range) {
  final SAMSequenceRecord sequence = dictionary.getSequence(range.getSequenceName());
  if (sequence == null) {
    throw new NoTalkbackSlimException("Sequence \"" + range.getSequenceName() + "\" referenced in region was not found in the SAM sequence dictionary.");
  }
  final int start = range.getStart() == SamRegionRestriction.MISSING ? 0 : range.getStart();
  final int length = sequence.getSequenceLength();
  if (start > length || (length != 0 && start == length)) {  // Allow start == 0 if empty sequence
    throw new NoTalkbackSlimException("The start position \"" + start + "\" must be less than than length of the sequence \"" + length + "\".");
  }
  int end = range.getEnd() == LongRange.MISSING ? length : range.getEnd();
  if (end > length) {
    Diagnostic.warning("The end position \"" + range.getEnd() + "\" is outside the length of the sequence (" + length
      + "). Defaulting end to \"" + length + "\"");
    end = length;
  }
  return new SequenceNameLocusSimple(range.getSequenceName(), start, end);
}
 
Example 3
Source File: IntervalUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Determines whether the provided interval is within the bounds of its assigned contig according to the provided dictionary
 *
 * @param interval interval to check
 * @param dictionary dictionary to use to validate contig bounds
 * @return true if the interval's contig exists in the dictionary, and the interval is within its bounds, otherwise false
 */
public static boolean intervalIsOnDictionaryContig( final SimpleInterval interval, final SAMSequenceDictionary dictionary ) {
    Utils.nonNull(interval);
    Utils.nonNull(dictionary);

    final SAMSequenceRecord contigRecord = dictionary.getSequence(interval.getContig());
    if ( contigRecord == null ) {
        return false;
    }

    return interval.getEnd() <= contigRecord.getSequenceLength();
}
 
Example 4
Source File: SequenceDictionaryUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Utility function that tests whether dict1's set of contigs is a superset of dict2's
 *
 * @param dict1 first sequence dictionary
 * @param dict2 second sequence dictionary
 * @return true if dict1's set of contigs supersets dict2's
 */
private static boolean supersets( SAMSequenceDictionary dict1, SAMSequenceDictionary dict2 ) {
    // Cannot rely on SAMSequenceRecord.equals() as it's too strict (takes extended attributes into account).
    for ( final SAMSequenceRecord dict2Record : dict2.getSequences() ) {
        final SAMSequenceRecord dict1Record = dict1.getSequence(dict2Record.getSequenceName());
        if ( dict1Record == null || ! sequenceRecordsAreEquivalent(dict2Record, dict1Record) ) {
            return false;
        }
    }

    return true;
}
 
Example 5
Source File: SequenceDictionaryUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Returns a List(x,y) that contains two disequal sequence records among the common contigs in both dicts.  Returns
 * null if all common contigs are equivalent
 *
 * @param commonContigs
 * @param dict1
 * @param dict2
 * @return
 */
private static List<SAMSequenceRecord> findDisequalCommonContigs(Set<String> commonContigs, SAMSequenceDictionary dict1, SAMSequenceDictionary dict2) {
    for ( String name : commonContigs ) {
        SAMSequenceRecord elt1 = dict1.getSequence(name);
        SAMSequenceRecord elt2 = dict2.getSequence(name);
        if ( ! sequenceRecordsAreEquivalent(elt1, elt2) )
            return Arrays.asList(elt1,elt2);
    }

    return null;
}
 
Example 6
Source File: SequenceDictionaryUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Checks whether the common contigs in the given sequence dictionaries occur at the same indices
 * in both dictionaries
 *
 * @param commonContigs Set of names of the contigs that occur in both dictionaries
 * @param dict1 first sequence dictionary
 * @param dict2 second sequence dictionary
 * @return true if the contigs common to dict1 and dict2 occur at the same indices in both dictionaries,
 *         otherwise false
 */
private static boolean commonContigsAreAtSameIndices( final Set<String> commonContigs, final SAMSequenceDictionary dict1, final SAMSequenceDictionary dict2 ) {
    for ( String commonContig : commonContigs ) {
        SAMSequenceRecord dict1Record = dict1.getSequence(commonContig);
        SAMSequenceRecord dict2Record = dict2.getSequence(commonContig);

        // Each common contig must have the same index in both dictionaries
        if ( dict1Record.getSequenceIndex() != dict2Record.getSequenceIndex() ) {
            return false;
        }
    }

    return true;
}
 
Example 7
Source File: PSFilter.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Validate arguments
 */
private void validateFilterArguments() {
    final SAMSequenceDictionary dictionary = header.getSequenceDictionary();
    if (filterArgs.alignedInput) {
        final Set<String> contigsToIgnoreSet = new HashSet<>(filterArgs.alignmentContigsToIgnore);
        for (final String contig : contigsToIgnoreSet) {
            if (dictionary.getSequence(contig) == null) {
                throw new UserException.BadInput("Ignored sequence " + contig + " not found in input header.");
            }
        }
    }
}
 
Example 8
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Returns if the given {@link SAMSequenceDictionary} is for the B37 Human Genome Reference.
 * @param sequenceDictionary The {@link SAMSequenceDictionary} to check for B37 Reference.
 * @return {@code true} if {@code sequenceDictionary} is for the B37 Human Genome Reference; {@code false} otherwise.
 */
public static boolean isSequenceDictionaryUsingB37Reference(final SAMSequenceDictionary sequenceDictionary) {
    // Check to make sure all our sequences are accounted for in the given dictionary.

    if ( sequenceDictionary == null ) {
        return false;
    }

    if ( B37_SEQUENCE_DICTIONARY == null ) {
        B37_SEQUENCE_DICTIONARY = initializeB37SequenceDict();
    }

    for ( final SAMSequenceRecord b37SequenceRecord : B37_SEQUENCE_DICTIONARY.getSequences() ) {
        // Now we check the Name, Length, and MD5Sum (if present) of all records:

        final SAMSequenceRecord inputSequenceRecord = sequenceDictionary.getSequence(b37SequenceRecord.getSequenceName());
        if ( inputSequenceRecord == null ) {
            return false;
        }

        if ( inputSequenceRecord.getSequenceLength() != b37SequenceRecord.getSequenceLength() ) {
            return false;
        }

        if ( (inputSequenceRecord.getMd5() != null) && (!inputSequenceRecord.getMd5().equals(b37SequenceRecord.getMd5())) ) {
            return false;
        }
    }

    return true;
}
 
Example 9
Source File: CreateSequenceDictionaryTest.java    From picard with MIT License 4 votes vote down vote up
@Test
public void testAltNames() throws Exception {
    final File altFile = File.createTempFile("CreateSequenceDictionaryTest", ".alt");
    final PrintWriter pw = new PrintWriter(altFile);
    pw.println("chr1\t1");
    pw.println("chr1\t01");
    pw.println("chr1\tk1");
    pw.println("chrMT\tM");
    pw.flush();
    pw.close();
    altFile.deleteOnExit();
    
    final File outputDict = File.createTempFile("CreateSequenceDictionaryTest.", ".dict");
    outputDict.delete();
    outputDict.deleteOnExit();
    final String[] argv = {
            "REFERENCE=" + BASIC_FASTA,
            "AN=" + altFile,
            "OUTPUT=" + outputDict,
            "TRUNCATE_NAMES_AT_WHITESPACE=true"
    };
    Assert.assertEquals(runPicardCommandLine(argv), 0);
    final SAMSequenceDictionary dict = SAMSequenceDictionaryExtractor.extractDictionary(outputDict.toPath());
    Assert.assertNotNull(dict, "dictionary is null");
   
    // check chr1

    final SAMSequenceRecord ssr1 = dict.getSequence("chr1");
    Assert.assertNotNull(ssr1, "chr1 missing in dictionary");
    final String an1 = ssr1.getAttribute("AN");
    Assert.assertNotNull(ssr1, "AN Missing");
    Set<String> anSet = new HashSet<>(Arrays.asList(an1.split("[,]")));

    Assert.assertTrue(anSet.contains("1"));
    Assert.assertTrue(anSet.contains("01"));
    Assert.assertTrue(anSet.contains("k1"));
    Assert.assertFalse(anSet.contains("M"));
    
    // check chr2
    SAMSequenceRecord ssr2 = dict.getSequence("chr2");
    Assert.assertNotNull(ssr2, "chr2 missing in dictionary");
    final String an2 = ssr2.getAttribute("AN");
    Assert.assertNull(an2, "AN Present");
    
    // check chrM
    final SAMSequenceRecord ssrM = dict.getSequence("chrM");
    Assert.assertNull(ssrM, "chrM present in dictionary");
}
 
Example 10
Source File: SimpleInterval.java    From gatk with BSD 3-Clause "New" or "Revised" License 3 votes vote down vote up
/**
 * Returns a new SimpleInterval that represents this interval as expanded by the specified amount in both
 * directions, bounded by the contig start/stop if necessary.
 *
 * @param padding amount to expand this interval
 * @param sequenceDictionary dictionary to use to determine the length of this interval's contig
 * @return a new SimpleInterval that represents this interval as expanded by the specified amount in both
 *         directions, bounded by the contig start/stop if necessary.
 */
public SimpleInterval expandWithinContig( final int padding, final SAMSequenceDictionary sequenceDictionary ) {
    Utils.nonNull(sequenceDictionary);
    final SAMSequenceRecord contigRecord = sequenceDictionary.getSequence(contig);
    Utils.nonNull( contigRecord, () -> "Contig " + contig + " not found in provided dictionary");

    return expandWithinContig(padding, contigRecord.getSequenceLength());
}
 
Example 11
Source File: AlignmentsTags.java    From cramtools with Apache License 2.0 3 votes vote down vote up
/**
 * Convenience method. See
 * {@link net.sf.cram.AlignmentsTags#calculateMdAndNmTags(SAMRecord, byte[], int, boolean, boolean)}
 * for details.
 * 
 * @param samRecord
 * @param referenceSource
 * @param samSequenceDictionary
 * @param calculateMdTag
 * @param calculateNmTag
 * @throws IOException
 */
public static void calculateMdAndNmTags(SAMRecord samRecord, ReferenceSource referenceSource,
		SAMSequenceDictionary samSequenceDictionary, boolean calculateMdTag, boolean calculateNmTag)
		throws IOException {
	if (!samRecord.getReadUnmappedFlag() && samRecord.getReferenceIndex() != SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX) {
		SAMSequenceRecord sequence = samSequenceDictionary.getSequence(samRecord.getReferenceIndex());
		ReferenceRegion region = referenceSource.getRegion(sequence, samRecord.getAlignmentStart(),
				samRecord.getAlignmentEnd());
		calculateMdAndNmTags(samRecord, region.array, samRecord.getAlignmentStart() - 1, calculateMdTag,
				calculateNmTag);
	}
}