Java Code Examples for htsjdk.samtools.SAMFileHeader#getSequenceDictionary()

The following examples show how to use htsjdk.samtools.SAMFileHeader#getSequenceDictionary() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: VcfHeader.java    From rtg-tools with BSD 2-Clause "Simplified" License 6 votes vote down vote up
/**
 * Add contig lines corresponding to the sequences present in a SAM header.
 * @param header the SAM header.
 */
public void addContigFields(SAMFileHeader header) {
  final SAMSequenceDictionary dic =  header.getSequenceDictionary();
  for (final SAMSequenceRecord seq : dic.getSequences()) {
    final ContigField f = new ContigField(seq.getSequenceName(), seq.getSequenceLength());
    if (seq.getAttribute(SAMSequenceRecord.ASSEMBLY_TAG) != null) {
      f.put("as", seq.getAttribute(SAMSequenceRecord.ASSEMBLY_TAG));
    }
    if (seq.getAttribute(SAMSequenceRecord.MD5_TAG) != null) {
      f.put("md5", seq.getAttribute(SAMSequenceRecord.MD5_TAG));
    }
    if (seq.getAttribute(SAMSequenceRecord.SPECIES_TAG) != null) {
      f.put("species", seq.getAttribute(SAMSequenceRecord.SPECIES_TAG));
    }
    addContigField(f);
  }
}
 
Example 2
Source File: HaplotypeCallerEngine.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Create and initialize a new HaplotypeCallerEngine given a collection of HaplotypeCaller arguments, a reads header,
 * and a reference file
 *  @param hcArgs command-line arguments for the HaplotypeCaller
 * @param assemblyRegionArgs
 * @param createBamOutIndex true to create an index file for the bamout
 * @param createBamOutMD5 true to create an md5 file for the bamout
 * @param readsHeader header for the reads
 * @param referenceReader reader to provide reference data, this reference reader will be closed when {@link #shutdown()} is called
 * @param annotationEngine variantAnnotatorEngine with annotations to process already added
 */
public HaplotypeCallerEngine(final HaplotypeCallerArgumentCollection hcArgs, AssemblyRegionArgumentCollection assemblyRegionArgs, boolean createBamOutIndex,
                             boolean createBamOutMD5, final SAMFileHeader readsHeader,
                             ReferenceSequenceFile referenceReader, VariantAnnotatorEngine annotationEngine) {
    this.hcArgs = Utils.nonNull(hcArgs);
    this.readsHeader = Utils.nonNull(readsHeader);
    this.referenceReader = Utils.nonNull(referenceReader);
    this.annotationEngine = Utils.nonNull(annotationEngine);
    this.aligner = SmithWatermanAligner.getAligner(hcArgs.smithWatermanImplementation);
    trimmer = new AssemblyRegionTrimmer(assemblyRegionArgs, readsHeader.getSequenceDictionary());
    forceCallingAllelesPresent = hcArgs.alleles != null;
    initialize(createBamOutIndex, createBamOutMD5);
    if (hcArgs.assemblyStateOutput != null) {
        try {
            assemblyDebugOutStream = new PrintStream(Files.newOutputStream(IOUtils.getPath(hcArgs.assemblyStateOutput)));
        } catch (IOException e) {
            throw new UserException.CouldNotCreateOutputFile(hcArgs.assemblyStateOutput, "Provided argument for assembly debug graph location could not be created");
        }
    } else {
        assemblyDebugOutStream = null;
    }
}
 
Example 3
Source File: ReferenceUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Given an InputStream connected to a fasta dictionary, returns its sequence dictionary
 *
 * Note: does not close the InputStream it's passed
 *
 * @param fastaDictionaryStream InputStream connected to a fasta dictionary
 * @return the SAMSequenceDictionary from the fastaDictionaryStream
 */
public static SAMSequenceDictionary loadFastaDictionary( final InputStream fastaDictionaryStream ) {
    // Don't close the reader when we're done, since we don't want to close the client's InputStream for them
    final BufferedLineReader reader = new BufferedLineReader(fastaDictionaryStream);

    final SAMTextHeaderCodec codec = new SAMTextHeaderCodec();
    final SAMFileHeader header = codec.decode(reader, fastaDictionaryStream.toString());

    // Make sure we have a valid sequence dictionary before continuing:
    if (header.getSequenceDictionary() == null || header.getSequenceDictionary().isEmpty()) {
        throw new UserException.MalformedFile (
                "Could not read sequence dictionary from given fasta stream " +
                        fastaDictionaryStream
        );
    }

    return header.getSequenceDictionary();
}
 
Example 4
Source File: ActivityProfileStateIterator.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
/**
 * Constructs an AssemblyRegionIterator over a provided read shard
 *  @param readShard MultiIntervalShard containing the reads that will go into the assembly regions.
 *                  Must have a MAPPED filter set on it.
 * @param readHeader header for the reads
 * @param reference source of reference bases (may be null)
 * @param features source of arbitrary features (may be null)
 * @param evaluator evaluator used to determine whether a locus is active
 */
public ActivityProfileStateIterator(final MultiIntervalShard<GATKRead> readShard,
                                    final SAMFileHeader readHeader,
                                    final ReferenceDataSource reference,
                                    final FeatureManager features,
                                    final AssemblyRegionEvaluator evaluator) {

    Utils.nonNull(readShard);
    Utils.nonNull(readHeader);
    Utils.nonNull(evaluator);

    this.reference = reference;
    this.features = features;
    this.evaluator = evaluator;

    // We wrap our LocusIteratorByState inside an IntervalAlignmentContextIterator so that we get empty loci
    // for uncovered locations. This is critical for reproducing GATK 3.x behavior!
    LocusIteratorByState libs = new LocusIteratorByState(readShard.iterator(), DownsamplingMethod.NONE, false, ReadUtils.getSamplesFromHeader(readHeader), readHeader, true);
    final IntervalLocusIterator intervalLocusIterator = new IntervalLocusIterator(readShard.getIntervals().iterator());
    this.locusIterator = new IntervalAlignmentContextIterator(libs, intervalLocusIterator, readHeader.getSequenceDictionary());
}
 
Example 5
Source File: LeftAlignAndTrimVariantsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
@Test
public void testLeftAlignWithVaryingMaxDistances() {

    final byte[] refSequence = new byte[200];
    Arrays.fill(refSequence, 0, 100, (byte) 'C');
    Arrays.fill(refSequence, 100, 200, (byte) 'A');

    final String contig = "1";
    final SAMFileHeader header = ArtificialReadUtils.createArtificialSamHeader(1, 1, refSequence.length);
    final SimpleInterval interval = new SimpleInterval(contig, 1, refSequence.length);


    final ReferenceBases refBases = new ReferenceBases(refSequence, interval);
    final ReferenceDataSource referenceDataSource = new ReferenceMemorySource(refBases, header.getSequenceDictionary());
    final ReferenceContext referenceContext = new ReferenceContext(referenceDataSource, interval);

    final List<Allele> alleles = Arrays.asList(Allele.create("AA", true), Allele.create("A", false));
    for (int maxShift : new int[] {0, 1, 5, 10, 30}) {
        for (int start : new int[]{101, 105, 109, 110, 111, 115, 120, 130, 141}) {
            final VariantContext vc = new VariantContextBuilder("source", contig, start, start + 1, alleles).make();
            final VariantContext realignedV = LeftAlignAndTrimVariants.leftAlignAndTrim(vc, referenceContext, maxShift, true);

            Assert.assertEquals(realignedV.getStart(), Math.max(start - maxShift, 100));
        }
    }
}
 
Example 6
Source File: BamIndexReaderTest.java    From rtg-tools with BSD 2-Clause "Simplified" License 6 votes vote down vote up
public void testSomeMethod() throws IOException {
  final File dir = FileUtils.createTempDir("bamindexreader", "test");
  try {
    final File bam = new File(dir, "mated.bam");
    FileHelper.resourceToFile("com/rtg/sam/resources/mated.bam", bam);
    final File index = new File(dir, "mated.bam.bai");
    FileHelper.resourceToFile("com/rtg/sam/resources/mated.bam.bai", index);
    final SAMFileHeader header = SamUtils.getSingleHeader(bam);
    final BamIndexReader tir = new BamIndexReader(index, header.getSequenceDictionary());
    final VirtualOffsets positions = tir.getFilePointers(SamRangeUtils.createExplicitReferenceRange(header, new SamRegionRestriction("simulatedSequence", 0, 5000)));
    assertEquals(151L, positions.start(0));
    assertEquals((446375L << 16) | 49187, positions.end(0));
  } finally {
    FileHelper.deleteAll(dir);
  }
}
 
Example 7
Source File: GenomeLocParserUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
@Test
public void testContigHasColon() {
    SAMFileHeader header = new SAMFileHeader();
    header.setSortOrder(htsjdk.samtools.SAMFileHeader.SortOrder.coordinate);
    SAMSequenceDictionary dict = new SAMSequenceDictionary();
    SAMSequenceRecord rec = new SAMSequenceRecord("c:h:r1", 10);
    rec.setSequenceLength(10);
    dict.addSequence(rec);
    header.setSequenceDictionary(dict);

    final GenomeLocParser myGenomeLocParser = new GenomeLocParser(header.getSequenceDictionary());
    GenomeLoc loc = myGenomeLocParser.parseGenomeLoc("c:h:r1:4-5");
    assertEquals(0, loc.getContigIndex());
    assertEquals(loc.getStart(), 4);
    assertEquals(loc.getStop(), 5);
}
 
Example 8
Source File: TagReadWithGeneExonFunctionTest.java    From Drop-seq with MIT License 6 votes vote down vote up
@Test
public void testTagCodingUTR () {
	SamReader inputSam = SamReaderFactory.makeDefault().open(testBAMFile);
	SAMFileHeader header = inputSam.getFileHeader();
	SAMSequenceDictionary bamDict = header.getSequenceDictionary();
	final OverlapDetector<Gene> geneOverlapDetector = GeneAnnotationReader.loadAnnotationsFile(annotationsFile, bamDict);

	String recName="H575CBGXY:2:21206:5655:8103";
	SAMRecord r = getRecord(testBAMFile, recName);

	TagReadWithGeneExonFunction tagger = new TagReadWithGeneExonFunction();
	r=tagger.setAnnotations(r, geneOverlapDetector);

	String geneName = r.getStringAttribute("GE");
	String geneStrand = r.getStringAttribute("GS");

	Assert.assertEquals(geneName, "Elp2");
	Assert.assertEquals(geneStrand, "+");
	Assert.assertEquals(r.getStringAttribute("XF"), LocusFunction.UTR.name());
}
 
Example 9
Source File: SvDiscoveryInputMetaData.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
public SvDiscoveryInputMetaData(final JavaSparkContext ctx,
                                final DiscoverVariantsFromContigAlignmentsSparkArgumentCollection discoverStageArgs,
                                final String nonCanonicalChromosomeNamesFile,
                                final String outputPath,
                                final ReadMetadata readMetadata,
                                final List<SVInterval> assembledIntervals,
                                final PairedStrandedIntervalTree<EvidenceTargetLink> evidenceTargetLinks,
                                final Broadcast<SVIntervalTree<VariantContext>> cnvCallsBroadcast,
                                final SAMFileHeader headerForReads,
                                final ReferenceMultiSparkSource reference,
                                final Set<VCFHeaderLine> defaultToolVCFHeaderLines,
                                final Logger toolLogger) {

    final SAMSequenceDictionary sequenceDictionary = headerForReads.getSequenceDictionary();
    final Broadcast<Set<String>> canonicalChromosomesBroadcast =
            ctx.broadcast(SVUtils.getCanonicalChromosomes(nonCanonicalChromosomeNamesFile, sequenceDictionary));
    final String sampleId = SVUtils.getSampleId(headerForReads);

    this.referenceData = new ReferenceData(canonicalChromosomesBroadcast, ctx.broadcast(reference), ctx.broadcast(sequenceDictionary));
    this.sampleSpecificData = new SampleSpecificData(sampleId, cnvCallsBroadcast, assembledIntervals, evidenceTargetLinks, readMetadata, ctx.broadcast(headerForReads));
    this.discoverStageArgs = discoverStageArgs;
    this.outputPath = outputPath;
    this.defaultToolVCFHeaderLines = defaultToolVCFHeaderLines;
    this.toolLogger = toolLogger;
}
 
Example 10
Source File: TagReadWithGeneFunctionTest.java    From Drop-seq with MIT License 6 votes vote down vote up
@Test
public void testTagCodingUTR () {
	SamReader inputSam = SamReaderFactory.makeDefault().open(testBAMFile);
	SAMFileHeader header = inputSam.getFileHeader();
	SAMSequenceDictionary bamDict = header.getSequenceDictionary();
	final OverlapDetector<Gene> geneOverlapDetector = GeneAnnotationReader.loadAnnotationsFile(annotationsFile, bamDict);

	String recName="H575CBGXY:2:21206:5655:8103";
	SAMRecord r = getRecord(testBAMFile, recName);

	TagReadWithGeneFunction tagger = new TagReadWithGeneFunction();
	r=tagger.setAnnotations(r, geneOverlapDetector, false);

	String geneName = r.getStringAttribute("gn");
	String geneStrand = r.getStringAttribute("gs");
	String function = r.getStringAttribute("gf");

	Assert.assertEquals(geneName, "Elp2");
	Assert.assertEquals(geneStrand, "+");
	Assert.assertEquals(function, "UTR");
	Assert.assertEquals(r.getStringAttribute("XF"), LocusFunction.UTR.name());
}
 
Example 11
Source File: TagReadWithGeneFunctionTest.java    From Drop-seq with MIT License 6 votes vote down vote up
@Test
public void testTagIntronRead () {
	SamReader inputSam = SamReaderFactory.makeDefault().open(testBAMFile);
	SAMFileHeader header = inputSam.getFileHeader();
	SAMSequenceDictionary bamDict = header.getSequenceDictionary();
	final OverlapDetector<Gene> geneOverlapDetector = GeneAnnotationReader.loadAnnotationsFile(annotationsFile, bamDict);

	String recName="H575CBGXY:4:23606:1714:16102";
	SAMRecord r = getRecord(testBAMFile, recName);

	TagReadWithGeneFunction tagger = new TagReadWithGeneFunction();
	r=tagger.setAnnotations(r, geneOverlapDetector, false);

	String geneName = r.getStringAttribute("gn");
	String geneStrand = r.getStringAttribute("gs");
	String function = r.getStringAttribute("gf");

	Assert.assertEquals(geneName, "Elp2");
	Assert.assertEquals(geneStrand, "+");
	Assert.assertEquals(function, "INTRONIC");
	Assert.assertEquals(r.getStringAttribute("XF"), "INTRONIC");

}
 
Example 12
Source File: HaplotypeCallerGenotypingEngine.java    From gatk-protected with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
protected VariantContext makeAnnotatedCall(byte[] ref, SimpleInterval refLoc, FeatureContext tracker, SAMFileHeader header, VariantContext mergedVC, ReadLikelihoods<Allele> readAlleleLikelihoods, VariantContext call) {
    final SimpleInterval locus = new SimpleInterval(mergedVC.getContig(), mergedVC.getStart(), mergedVC.getEnd());
    final SimpleInterval refLocInterval= new SimpleInterval(refLoc);
    final ReferenceDataSource refData = new ReferenceMemorySource(new ReferenceBases(ref, refLocInterval), header.getSequenceDictionary());
    final ReferenceContext referenceContext = new ReferenceContext(refData, locus, refLocInterval);

    final VariantContext untrimmedResult =  annotationEngine.annotateContext(call, tracker, referenceContext, readAlleleLikelihoods, a -> true);
    return call.getAlleles().size() == mergedVC.getAlleles().size() ? untrimmedResult
            : GATKVariantContextUtils.reverseTrimAlleles(untrimmedResult);
}
 
Example 13
Source File: FindBadGenomicKmersSpark.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/** Get the list of high copy number kmers in the reference, and write them to a file. */
@Override
protected void runTool( final JavaSparkContext ctx ) {
    final SAMFileHeader hdr = getHeaderForReads();
    SAMSequenceDictionary dict = null;
    if ( hdr != null ) dict = hdr.getSequenceDictionary();
    final ReferenceMultiSparkSource referenceMultiSource = getReference();
    Collection<SVKmer> killList = findBadGenomicKmers(ctx, kSize, maxDUSTScore, referenceMultiSource, dict);
    if ( highCopyFastaFilename != null ) {
        killList = SVUtils.uniquify(killList, processFasta(kSize, maxDUSTScore, highCopyFastaFilename));
    }

    SVFileUtils.writeKmersFile(outputFile, kSize, killList);
}
 
Example 14
Source File: AssemblyRegionIterator.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Constructs an AssemblyRegionIterator over a provided read shard
 *  @param readShard MultiIntervalShard containing the reads that will go into the assembly regions.
 *                  Must have a MAPPED filter set on it.
 * @param readHeader header for the reads
 * @param reference source of reference bases (may be null)
 * @param features source of arbitrary features (may be null)
 * @param evaluator evaluator used to determine whether a locus is active
 */
public AssemblyRegionIterator(final MultiIntervalShard<GATKRead> readShard,
                              final SAMFileHeader readHeader,
                              final ReferenceDataSource reference,
                              final FeatureManager features,
                              final AssemblyRegionEvaluator evaluator,
                              final AssemblyRegionArgumentCollection assemblyRegionArgs) {

    Utils.nonNull(readShard);
    Utils.nonNull(readHeader);
    Utils.nonNull(evaluator);
    assemblyRegionArgs.validate();


    this.readShard = readShard;
    this.readHeader = readHeader;
    this.reference = reference;
    this.features = features;
    this.evaluator = evaluator;
    this.assemblyRegionArgs = assemblyRegionArgs;

    this.readyRegion = null;
    this.previousRegionReads = null;
    this.pendingRegions = new ArrayDeque<>();
    this.readCachingIterator = new ReadCachingIterator(readShard.iterator());
    this.readCache = new ArrayDeque<>();
    this.activityProfile = new BandPassActivityProfile(assemblyRegionArgs.maxProbPropagationDistance, assemblyRegionArgs.activeProbThreshold, BandPassActivityProfile.MAX_FILTER_SIZE, BandPassActivityProfile.DEFAULT_SIGMA, readHeader);

    // We wrap our LocusIteratorByState inside an IntervalAlignmentContextIterator so that we get empty loci
    // for uncovered locations. This is critical for reproducing GATK 3.x behavior!
    this.libs = new LocusIteratorByState(readCachingIterator, DownsamplingMethod.NONE, false, ReadUtils.getSamplesFromHeader(readHeader), readHeader, true);
    final IntervalLocusIterator intervalLocusIterator = new IntervalLocusIterator(readShard.getIntervals().iterator());
    this.locusIterator = new IntervalAlignmentContextIterator(libs, intervalLocusIterator, readHeader.getSequenceDictionary());

    readyRegion = loadNextAssemblyRegion();
}
 
Example 15
Source File: StructuralVariationDiscoveryPipelineSpark.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Filters out "failed" assemblies, unmapped and secondary (i.e. "XA") alignments, and
 * turn the alignments of contigs into custom {@link AlignmentInterval} format.
 * Should be generating the same output as {@link #filterAndConvertToAlignedContigDirect(Iterable, List, SAMFileHeader)};
 * and currently this assertion is tested in {@see AlignedContigGeneratorUnitTest#testConvertAlignedAssemblyOrExcuseToAlignedContigsDirectAndConcordanceWithSAMRoute()}
 */
@VisibleForTesting
public static JavaRDD<AlignedContig> filterAndConvertToAlignedContigViaSAM(final List<AlignedAssemblyOrExcuse> alignedAssemblyOrExcuseList,
                                                                           final SAMFileHeader header,
                                                                           final JavaSparkContext ctx) {

    final SAMFileHeader cleanHeader = new SAMFileHeader(header.getSequenceDictionary());
    final List<String> refNames = SequenceDictionaryUtils.getContigNamesList(header.getSequenceDictionary());

    return ctx.parallelize(alignedAssemblyOrExcuseList)
            .filter(AlignedAssemblyOrExcuse::isNotFailure)
            .flatMap(alignedAssemblyNoExcuse -> {
                        final FermiLiteAssembly assembly = alignedAssemblyNoExcuse.getAssembly();
                        final int assemblyId = alignedAssemblyNoExcuse.getAssemblyId();
                        final List<List<BwaMemAlignment>> allAlignmentsOfThisAssembly = alignedAssemblyNoExcuse.getContigAlignments();
                        final int nContigs = assembly.getNContigs();
                        return IntStream.range(0, nContigs)
                                .mapToObj(contigIdx ->
                                        BwaMemAlignmentUtils.toSAMStreamForRead(
                                                AlignedAssemblyOrExcuse.formatContigName(assemblyId, contigIdx),
                                                assembly.getContig(contigIdx).getSequence(),
                                                allAlignmentsOfThisAssembly.get(contigIdx),
                                                cleanHeader, refNames,
                                                new SAMReadGroupRecord(SVUtils.GATKSV_CONTIG_ALIGNMENTS_READ_GROUP_ID)
                                        )
                                ).iterator();
                    }
            )
            .map(forOneContig ->
                    forOneContig.filter(sam -> !sam.getReadUnmappedFlag() && !sam.isSecondaryAlignment())
                            .collect(Collectors.toList()))
            .filter(list -> !list.isEmpty())
            .map(forOneContig ->
                    SvDiscoverFromLocalAssemblyContigAlignmentsSpark.
                            SAMFormattedContigAlignmentParser.
                            parseReadsAndOptionallySplitGappedAlignments(forOneContig,
                                    DiscoverVariantsFromContigAlignmentsSparkArgumentCollection
                                            .GAPPED_ALIGNMENT_BREAK_DEFAULT_SENSITIVITY,
                                    true));
}
 
Example 16
Source File: AlignmentContextIteratorBuilder.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 *  Create the appropriate instance of an alignment context spliterator based on the input parameters.
 *
 *  Please note that this wrapper is still tied to {@link LocusIteratorByState} and some parameters are being passed directly to that class.
 *
 * @param intervalsForTraversal the intervals to generate alignment contexts over.
 * @param header SAM file header to use
 * @param readIterator iterator of sorted GATK reads
 * @param dictionary the SAMSequenceDictionary being used for this traversal.  This can be the same as the reference.  {@code null} is supported, but will often lead to invalid parameter combinations.
 * @param downsamplingInfo how to downsample (for {@link LocusIteratorByState})
 * @param isReference the dictionary specified above is a reference, {@code false} if no reference being used or it is not a reference.
 * @param emitEmptyLoci whether loci with no coverage should be emitted.  In this case, the AlignmentContext will be empty (not null).
 * @param isKeepUniqueReadListInLibs if true, we will keep the unique reads from the samIterator and make them
 *                                       available via the transferReadsFromAllPreviousPileups interface (this parameter is specific to {@link LocusIteratorByState})
 * @param isIncludeDeletions include reads with deletion on the loci in question
 * @param isIncludeNs include reads with N on the loci in question
 * @return iterator that produces AlignmentContexts ready for consumption (e.g. by a {@link org.broadinstitute.hellbender.engine.LocusWalker})
 */
private static Iterator<AlignmentContext> createAlignmentContextIterator(final List<SimpleInterval> intervalsForTraversal,
                                                                         final SAMFileHeader header,
                                                                         final Iterator<GATKRead> readIterator,
                                                                         final SAMSequenceDictionary dictionary,
                                                                         final LIBSDownsamplingInfo downsamplingInfo,
                                                                         final boolean isReference,
                                                                         boolean emitEmptyLoci,
                                                                         boolean isKeepUniqueReadListInLibs,
                                                                         boolean isIncludeDeletions,
                                                                         boolean isIncludeNs) {

    // get the samples from the read groups
    final Set<String> samples = header.getReadGroups().stream()
            .map(SAMReadGroupRecord::getSample)
            .collect(Collectors.toSet());

    // get the LIBS
    final LocusIteratorByState libs = new LocusIteratorByState(readIterator, downsamplingInfo, isKeepUniqueReadListInLibs, samples, header, isIncludeDeletions, isIncludeNs);

    List<SimpleInterval> finalIntervals = intervalsForTraversal;
    validateEmitEmptyLociParameters(emitEmptyLoci, dictionary, intervalsForTraversal, isReference);
    if (emitEmptyLoci) {

        // If no intervals were specified, then use the entire reference (or best available sequence dictionary).
        if (!areIntervalsSpecified(finalIntervals)) {
            finalIntervals = IntervalUtils.getAllIntervalsForReference(dictionary);
        }
        final IntervalLocusIterator intervalLocusIterator = new IntervalLocusIterator(finalIntervals.iterator());
        return new IntervalAlignmentContextIterator(libs, intervalLocusIterator, header.getSequenceDictionary());
    } else if (areIntervalsSpecified(finalIntervals)) {
        return new IntervalOverlappingIterator<>(libs, finalIntervals, header.getSequenceDictionary());
    } else {
        // prepare the iterator
        return libs;
    }
}
 
Example 17
Source File: Mutect2Engine.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Create and initialize a new HaplotypeCallerEngine given a collection of HaplotypeCaller arguments, a reads header,
 * and a reference file
 *  @param MTAC command-line arguments for the HaplotypeCaller
 * @param assemblyRegionArgs
 * @param createBamOutIndex true to create an index file for the bamout
 * @param createBamOutMD5 true to create an md5 file for the bamout
 * @param header header for the reads
 * @param referenceSpec reference specifier for the reference
 * @param annotatorEngine annotator engine built with desired annotations
 */
public Mutect2Engine(final M2ArgumentCollection MTAC, AssemblyRegionArgumentCollection assemblyRegionArgs, final boolean createBamOutIndex, final boolean createBamOutMD5, final SAMFileHeader header, final GATKPath referenceSpec, final VariantAnnotatorEngine annotatorEngine) {
    this.MTAC = Utils.nonNull(MTAC);
    this.header = Utils.nonNull(header);
    minCallableDepth = MTAC.callableDepth;
    referenceReader = AssemblyBasedCallerUtils.createReferenceReader(Utils.nonNull(referenceSpec));
    aligner = SmithWatermanAligner.getAligner(MTAC.smithWatermanImplementation);
    samplesList = new IndexedSampleList(new ArrayList<>(ReadUtils.getSamplesFromHeader(header)));

    // optimize set operations for the common cases of no normal and one normal
    if (MTAC.normalSamples.isEmpty()) {
        normalSamples = Collections.emptySet();
    } else if (MTAC.normalSamples.size() == 1) {
        normalSamples = Collections.singleton(decodeSampleNameIfNecessary(MTAC.normalSamples.iterator().next()));
    } else {
        normalSamples = MTAC.normalSamples.stream().map(this::decodeSampleNameIfNecessary).collect(Collectors.toSet());
    }
    normalSamples.forEach(this::checkSampleInBamHeader);

    forceCallingAllelesPresent = MTAC.alleles != null;

    annotationEngine = Utils.nonNull(annotatorEngine);
    assemblyEngine = MTAC.createReadThreadingAssembler();
    likelihoodCalculationEngine = AssemblyBasedCallerUtils.createLikelihoodCalculationEngine(MTAC.likelihoodArgs);
    genotypingEngine = new SomaticGenotypingEngine(MTAC, normalSamples, annotationEngine);
    haplotypeBAMWriter = AssemblyBasedCallerUtils.createBamWriter(MTAC, createBamOutIndex, createBamOutMD5, header);
    trimmer = new AssemblyRegionTrimmer(assemblyRegionArgs, header.getSequenceDictionary());
    referenceConfidenceModel = new SomaticReferenceConfidenceModel(samplesList, header, 0, MTAC.minAF);  //TODO: do something classier with the indel size arg
    final List<String> tumorSamples = ReadUtils.getSamplesFromHeader(header).stream().filter(this::isTumorSample).collect(Collectors.toList());
    f1R2CountsCollector = MTAC.f1r2TarGz == null ? Optional.empty() : Optional.of(new F1R2CountsCollector(MTAC.f1r2Args, header, MTAC.f1r2TarGz, tumorSamples));
}
 
Example 18
Source File: SamRangeUtils.java    From rtg-tools with BSD 2-Clause "Simplified" License 4 votes vote down vote up
ResolvedBedRangeLoader(SAMFileHeader header) {
  super();
  mDictionary = header.getSequenceDictionary();
}
 
Example 19
Source File: ActivityProfileStateUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@BeforeClass
public void init() {
    // sequence
    final SAMFileHeader header = ArtificialReadUtils.createArtificialSamHeader(1, 1, 100);
    sequenceDictionary =header.getSequenceDictionary();
}
 
Example 20
Source File: AnnotationUtilsTest.java    From Drop-seq with MIT License 4 votes vote down vote up
private OverlapDetector<Gene> getGeneOverlapDetector (final File bam, final File gtf) {
	SamReader inputSam = SamReaderFactory.makeDefault().open(bam);
	SAMFileHeader header = inputSam.getFileHeader();
	SAMSequenceDictionary bamDict = header.getSequenceDictionary();
	return GeneAnnotationReader.loadAnnotationsFile(gtf, bamDict);
}