Java Code Examples for htsjdk.samtools.util.ProgressLogger#record()

The following examples show how to use htsjdk.samtools.util.ProgressLogger#record() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: GatherReadQualityMetrics.java    From Drop-seq with MIT License 6 votes vote down vote up
public Map<String, ReadQualityMetrics> gatherMetrics(final File inputSamOrBamFile) {
	ProgressLogger p = new ProgressLogger(this.log);
       Map<String, ReadQualityMetrics> result = new TreeMap<>();

	ReadQualityMetrics globalMetrics = new ReadQualityMetrics(this.MINIMUM_MAPPING_QUALITY, this.GLOBAL, true);

	SamReader in = SamReaderFactory.makeDefault().open(INPUT);

	for (final SAMRecord r : in)
		if (!r.getReadFailsVendorQualityCheckFlag() || INCLUDE_NON_PF_READS) {
               p.record(r);

               globalMetrics.addRead(r);
               // gather per tag metrics if required.
               result = addMetricsPerTag(r, result);
           }

	CloserUtil.close(in);
	result.put(this.GLOBAL, globalMetrics);
	return (result);
}
 
Example 2
Source File: SamFormatConverter.java    From picard with MIT License 6 votes vote down vote up
/**
 * Convert a file from one of sam/bam/cram format to another based on the extension of output.
 *
 * @param input             input file in one of sam/bam/cram format
 * @param output            output to write converted file to, the conversion is based on the extension of this filename
 * @param referenceSequence the reference sequence to use, necessary when reading/writing cram
 * @param createIndex       whether or not an index should be written alongside the output file
 */
public static void convert(final File input, final File output, final File referenceSequence, final Boolean createIndex) {
    IOUtil.assertFileIsReadable(input);
    IOUtil.assertFileIsWritable(output);
    final SamReader reader = SamReaderFactory.makeDefault().referenceSequence(referenceSequence).open(input);
    final SAMFileWriter writer = new SAMFileWriterFactory().makeWriter(reader.getFileHeader(), true, output, referenceSequence);
    if (createIndex && writer.getFileHeader().getSortOrder() != SAMFileHeader.SortOrder.coordinate) {
        throw new PicardException("Can't CREATE_INDEX unless sort order is coordinate");
    }

    final ProgressLogger progress = new ProgressLogger(Log.getInstance(SamFormatConverter.class));
    for (final SAMRecord rec : reader) {
        writer.addAlignment(rec);
        progress.record(rec);
    }
    CloserUtil.close(reader);
    writer.close();
}
 
Example 3
Source File: DetectBeadSynthesisErrors.java    From Drop-seq with MIT License 6 votes vote down vote up
/**
 * For a single cell barcode, gather up all the reads/UMIs to test for UMI errors.
 * @param umiCollectionList A collection of UMIs for a cell.
 * @param umiStringCache A cache to save space on UMIs - there are only 4^length possible UMIs, and length is usually 8-9, so UMIs are often repeated.
 * @param prog A progress logger.
 * @return A BeadSynthesisErrorData object for a single cell barcode across all UMIs.
 */
private BeadSynthesisErrorData buildBeadSynthesisErrorData (final List<UMICollection> umiCollectionList, final StringInterner umiStringCache, final ProgressLogger prog) {
	final String cellBarcode = umiCollectionList.get(0).getCellBarcode();
	BeadSynthesisErrorData bsed = new BeadSynthesisErrorData(cellBarcode);
	for (final UMICollection umis : umiCollectionList) {
		int transcriptCounts = umis.getDigitalExpression(1, 1, false);
		int readCounts = umis.getDigitalExpression(1, 1, true);
		Collection<String> umiCol = umis.getMolecularBarcodes();
		umiCol = getUMIsFromCache(umiCol, umiStringCache);
		bsed.addUMI(umiCol);
		bsed.incrementReads(readCounts);
		bsed.incrementTranscripts(transcriptCounts);
		prog.record(null, 0);
	}
	return bsed;
}
 
Example 4
Source File: OpticalDuplicateFinder.java    From picard with MIT License 6 votes vote down vote up
private void fillGraphFromAGroup(final List<? extends PhysicalLocation> wholeList, final List<Integer> groupList, final boolean logProgress, final ProgressLogger progressLoggerForKeeper, final int distance, final GraphUtils.Graph<Integer> opticalDistanceRelationGraph) {

        for (int i = 0; i < groupList.size(); i++) {
            final int iIndex = groupList.get(i);
            final PhysicalLocation currentLoc = wholeList.get(iIndex);
            // The main point of adding this log and if statement (also below) is a workaround a bug in the JVM
            // which causes a deep exception (https://github.com/broadinstitute/picard/issues/472).
            // It seems that this is related to https://bugs.openjdk.java.net/browse/JDK-8033717 which
            // was closed due to non-reproducibility. We came across a bam file that evoked this error
            // every time we tried to duplicate-mark it. The problem seemed to be a duplicate-set of size 500,000,
            // and this loop seemed to kill the JVM for some reason. This logging statement (and the one in the
            // loop below) solved the problem.
            if (logProgress) {
                progressLoggerForKeeper.record(String.format("%d", currentLoc.getReadGroup()), currentLoc.getX());
            }

            for (int j = i + 1; j < groupList.size(); j++) {
                final int jIndex = groupList.get(j);
                final PhysicalLocation other = wholeList.get(jIndex);

                if (closeEnoughShort(currentLoc, other, distance)) {
                    opticalDistanceRelationGraph.addEdge(iIndex, jIndex);
                }
            }
        }
    }
 
Example 5
Source File: ReplaceSamHeader.java    From picard with MIT License 6 votes vote down vote up
private void standardReheader(final SAMFileHeader replacementHeader) {
    final SamReader recordReader = SamReaderFactory.makeDefault().referenceSequence(REFERENCE_SEQUENCE).validationStringency(ValidationStringency.SILENT).open(INPUT);
    if (replacementHeader.getSortOrder() != recordReader.getFileHeader().getSortOrder()) {
        throw new PicardException("Sort orders of INPUT (" + recordReader.getFileHeader().getSortOrder().name() +
                ") and HEADER (" + replacementHeader.getSortOrder().name() + ") do not agree.");
    }
    final SAMFileWriter writer = new SAMFileWriterFactory().makeSAMOrBAMWriter(replacementHeader, true, OUTPUT);

    final ProgressLogger progress = new ProgressLogger(Log.getInstance(ReplaceSamHeader.class));
    for (final SAMRecord rec : recordReader) {
        rec.setHeader(replacementHeader);
        writer.addAlignment(rec);
        progress.record(rec);
    }
    writer.close();
    CloserUtil.close(recordReader);
}
 
Example 6
Source File: BaseDistributionAtReadPosition.java    From Drop-seq with MIT License 6 votes vote down vote up
BaseDistributionMetricCollection gatherBaseQualities (final File input, final String tag) {
	ProgressLogger pl = new ProgressLogger(this.log);
	SamReader inputSam = SamReaderFactory.makeDefault().open(input);

	BaseDistributionMetricCollection c = new BaseDistributionMetricCollection();
	// MAIN_LOOP:
	for (final SAMRecord samRecord : inputSam) {
		pl.record(samRecord);
		if (samRecord.isSecondaryOrSupplementary()) continue;
		String b = samRecord.getStringAttribute(tag);
		if (b==null) continue;

		byte [] bases = b.getBytes();
		for (int i=0; i<bases.length; i++) {
			char base = (char) (bases[i]);
			int idx=i+1;
			c.addBase(base, idx);
		}
	}

	CloserUtil.close(inputSam);
	return (c);

}
 
Example 7
Source File: BamTagHistogram.java    From Drop-seq with MIT License 6 votes vote down vote up
public ObjectCounter<String> getBamTagCounts (final Iterator<SAMRecord> iterator, final String tag, final int readQuality, final boolean filterPCRDuplicates) {
    ProgressLogger pl = new ProgressLogger(log, 10000000);

    ObjectCounter<String> counter = new ObjectCounter<>();

    for (final SAMRecord r : new IterableAdapter<>(iterator)) {
        pl.record(r);
        if (filterPCRDuplicates && r.getDuplicateReadFlag()) continue;
        if (r.getMappingQuality()<readQuality) continue;
        if (r.isSecondaryOrSupplementary()) continue;
        //String s1 = r.getStringAttribute(tag);
        String s1 = getAnyTagAsString(r, tag);
        if (s1!=null && s1!="") counter.increment(s1); // if the tag doesn't have a value, don't increment it.

    }
    return (counter);
}
 
Example 8
Source File: EnhanceGTFRecords.java    From Drop-seq with MIT License 6 votes vote down vote up
/**
 * @param records GTFRecords, potentially from many genes
 * @return The input records, plus gene, transcript, intron and conserved_intron records.
 */
public List<GTFRecord> enhanceGTFRecords (Iterator<GTFRecord> records) {
       final ProgressLogger progressLogger = new ProgressLogger(LOG, 100, "enhancing", "genes");
       List<GTFRecord> result = new ArrayList<>();
       final GeneFromGTFBuilder geneBuilder = new GeneFromGTFBuilder(records);
       while (geneBuilder.hasNext()) {
           try {
               progressLogger.record();
               GeneFromGTF gene = geneBuilder.next();
               progressLogger.record(gene.getName(), 0);
               LOG.debug("Enhancing Gene [" + gene.getName() + "]");
               result.addAll(enhanceGene(gene));
           } catch (AnnotationException e) {
               LOG.info(e.getMessage() + " -- skipping");
           }
       }
	return result;
}
 
Example 9
Source File: FilterBamByTag.java    From Drop-seq with MIT License 5 votes vote down vote up
/**
 *
 * @param in
 * @param out
 * @param values
 */
void processPairedMode (final SamReader in, final SAMFileWriter out, final Set<String> values, final File summaryFile, Integer mapQuality, boolean allowPartial) {
	ProgressLogger progLog = new ProgressLogger(log);
	FilteredReadsMetric m = new FilteredReadsMetric();
	
	PeekableIterator<SAMRecord> iter = new PeekableIterator<>(CustomBAMIterators.getQuerynameSortedRecords(in));
	while (iter.hasNext()) {
		SAMRecord r1 = iter.next();
		progLog.record(r1);
		boolean filterFlag1 = filterRead(r1, this.TAG, values, this.ACCEPT_TAG, mapQuality, allowPartial);
		
		SAMRecord r2 = null;
		if (iter.hasNext()) r2 = iter.peek();
		// check for r2 being null in case the last read is unpaired.
		if (r2!=null && r1.getReadName().equals(r2.getReadName())) {
			// paired read found.
			progLog.record(r2);
			r2=iter.next();
			boolean filterFlag2 = filterRead(r2, this.TAG, values, this.ACCEPT_TAG, mapQuality, allowPartial);
			// if in accept tag mode, if either read shouldn't be filterd accept the pair
			// if in reject mode, if both reads shouldn't be filtered to accept the pair.
			if ((!filterFlag1 || !filterFlag2 & this.ACCEPT_TAG) || (!filterFlag1 && !filterFlag2 & !this.ACCEPT_TAG)) {
				out.addAlignment(r1);
				out.addAlignment(r2);
				m.READS_ACCEPTED+=2;
			} else 
				m.READS_REJECTED+=2;
		} else if (!filterFlag1) {
			out.addAlignment(r1);
			m.READS_ACCEPTED++;
		} else {
			m.READS_REJECTED++;
		}
	}
	CloserUtil.close(in);
	writeSummary(summaryFile, m);
	out.close();
	reportAndCheckFilterResults(m);
}
 
Example 10
Source File: TagReadWithInterval.java    From Drop-seq with MIT License 5 votes vote down vote up
@Override
protected int doWork() {
	IOUtil.assertFileIsReadable(INPUT);
	IOUtil.assertFileIsWritable(OUTPUT);
	SamReader inputSam = SamReaderFactory.makeDefault().open(INPUT);
	SAMFileHeader header = inputSam.getFileHeader();
       header.setSortOrder(SAMFileHeader.SortOrder.coordinate);
	SamHeaderUtil.addPgRecord(header, this);

	SAMFileWriter writer= new SAMFileWriterFactory().makeSAMOrBAMWriter(header, true, OUTPUT);

	IntervalList loci = IntervalList.fromFile(this.INTERVALS);
	OverlapDetector<Interval> od =getOverlapDetector (loci);
	ProgressLogger processLogger = new ProgressLogger(log);

	for (SAMRecord record: inputSam) {
		processLogger.record(record);

		if (record.getReadUnmappedFlag()==false)
			// use alignment blocks instead of start/end to properly deal with split reads mapped over exon/exon boundaries.
			record=tagRead(record, od);
		else
			record.setAttribute(this.TAG, null);
		writer.addAlignment(record);

	}

	CloserUtil.close(inputSam);
	writer.close();

	return(0);


}
 
Example 11
Source File: UpdateVcfSequenceDictionary.java    From picard with MIT License 5 votes vote down vote up
@Override
protected int doWork() {
    IOUtil.assertFileIsReadable(INPUT);
    IOUtil.assertFileIsReadable(SEQUENCE_DICTIONARY);
    IOUtil.assertFileIsWritable(OUTPUT);

    final SAMSequenceDictionary samSequenceDictionary = SAMSequenceDictionaryExtractor.extractDictionary(SEQUENCE_DICTIONARY.toPath());

    final VCFFileReader fileReader = new VCFFileReader(INPUT, false);
    final VCFHeader fileHeader = fileReader.getFileHeader();

    final VariantContextWriterBuilder builder = new VariantContextWriterBuilder()
            .setReferenceDictionary(samSequenceDictionary)
            .clearOptions();
    if (CREATE_INDEX)
        builder.setOption(Options.INDEX_ON_THE_FLY);

    final VariantContextWriter vcfWriter = builder.setOutputFile(OUTPUT).build();
    fileHeader.setSequenceDictionary(samSequenceDictionary);
    vcfWriter.writeHeader(fileHeader);

    final ProgressLogger progress = new ProgressLogger(log, 10000);
    final CloseableIterator<VariantContext> iterator = fileReader.iterator();
    while (iterator.hasNext()) {
        final VariantContext context = iterator.next();
        vcfWriter.add(context);
        progress.record(context.getContig(), context.getStart());
    }

    CloserUtil.close(iterator);
    CloserUtil.close(fileReader);
    vcfWriter.close();

    return 0;
}
 
Example 12
Source File: SamToFastq.java    From picard with MIT License 4 votes vote down vote up
protected int doWork() {
    IOUtil.assertFileIsReadable(INPUT);
    final SamReader reader = SamReaderFactory.makeDefault().referenceSequence(REFERENCE_SEQUENCE).open(INPUT);
    final Map<String, SAMRecord> firstSeenMates = new HashMap<>();
    final FastqWriterFactory factory = new FastqWriterFactory();
    factory.setCreateMd5(CREATE_MD5_FILE);

    initializeAdditionalWriters();
    final Map<SAMReadGroupRecord, FastqWriters> writers = generateWriters(reader.getFileHeader().getReadGroups(),
            factory);
    final Map<SAMReadGroupRecord, List<FastqWriter>> additionalWriters = generateAdditionalWriters(reader.getFileHeader().getReadGroups(), factory);
    if (writers.isEmpty()) {
        final String msgBase = INPUT + " does not contain Read Groups";
        final String msg = OUTPUT_PER_RG ? msgBase + ", consider not using the OUTPUT_PER_RG option" : msgBase;
        throw new PicardException(msg);
    }

    final ProgressLogger progress = new ProgressLogger(log);

    for (final SAMRecord currentRecord : reader) {
        handleRecord(currentRecord, writers, additionalWriters, firstSeenMates);
        progress.record(currentRecord);
    }

    CloserUtil.close(reader);

    // Close all the fastq writers being careful to close each one only once!
    for (final FastqWriters writerMapping : new HashSet<>(writers.values())) {
        writerMapping.closeAll();
    }

    // close all `additionalWriters` only once
    final Set<FastqWriter> additionalWriterSet = new HashSet<>();
    additionalWriters.values().forEach(additionalWriterSet::addAll);
    for (final FastqWriter fastqWriter : additionalWriterSet) {
        fastqWriter.close();
    }

    if (!firstSeenMates.isEmpty()) {
        SAMUtils.processValidationError(new SAMValidationError(SAMValidationError.Type.MATE_NOT_FOUND,
                "Found " + firstSeenMates.size() + " unpaired mates", null), VALIDATION_STRINGENCY);
    }

    return 0;
}
 
Example 13
Source File: MergeVcfs.java    From picard with MIT License 4 votes vote down vote up
@Override
protected int doWork() {
    final ProgressLogger progress = new ProgressLogger(log, 10000);
    final List<String> sampleList = new ArrayList<String>();
    INPUT = IOUtil.unrollFiles(INPUT, IOUtil.VCF_EXTENSIONS);
    final Collection<CloseableIterator<VariantContext>> iteratorCollection = new ArrayList<CloseableIterator<VariantContext>>(INPUT.size());
    final Collection<VCFHeader> headers = new HashSet<VCFHeader>(INPUT.size());
    VariantContextComparator variantContextComparator = null;
    SAMSequenceDictionary sequenceDictionary = null;

    if (SEQUENCE_DICTIONARY != null) {
        sequenceDictionary = SamReaderFactory.makeDefault().referenceSequence(REFERENCE_SEQUENCE).open(SEQUENCE_DICTIONARY).getFileHeader().getSequenceDictionary();
    }

    for (final File file : INPUT) {
        IOUtil.assertFileIsReadable(file);
        final VCFFileReader fileReader = new VCFFileReader(file, false);
        final VCFHeader fileHeader = fileReader.getFileHeader();
        if (fileHeader.getContigLines().isEmpty()) {
            if (sequenceDictionary == null) {
                throw new IllegalArgumentException(SEQ_DICT_REQUIRED);
            } else {
                fileHeader.setSequenceDictionary(sequenceDictionary);
            }
        }

        if (variantContextComparator == null) {
            variantContextComparator = fileHeader.getVCFRecordComparator();
        } else {
            if (!variantContextComparator.isCompatible(fileHeader.getContigLines())) {
                throw new IllegalArgumentException(
                        "The contig entries in input file " + file.getAbsolutePath() + " are not compatible with the others.");
            }
        }

        if (sequenceDictionary == null) sequenceDictionary = fileHeader.getSequenceDictionary();

        if (sampleList.isEmpty()) {
            sampleList.addAll(fileHeader.getSampleNamesInOrder());
        } else {
            if (!sampleList.equals(fileHeader.getSampleNamesInOrder())) {
                throw new IllegalArgumentException("Input file " + file.getAbsolutePath() + " has sample entries that don't match the other files.");
            }
        }
        
        // add comments in the first header
        if (headers.isEmpty()) {
            COMMENT.stream().forEach(C -> fileHeader.addMetaDataLine(new VCFHeaderLine("MergeVcfs.comment", C)));
        }

        headers.add(fileHeader);
        iteratorCollection.add(fileReader.iterator());
    }

    if (CREATE_INDEX && sequenceDictionary == null) {
        throw new PicardException(String.format("Index creation failed. %s", SEQ_DICT_REQUIRED));
    }

    final VariantContextWriterBuilder builder = new VariantContextWriterBuilder()
            .setOutputFile(OUTPUT)
            .setReferenceDictionary(sequenceDictionary);

    if (CREATE_INDEX) {
        builder.setOption(Options.INDEX_ON_THE_FLY);
    } else {
        builder.unsetOption(Options.INDEX_ON_THE_FLY);
    }
    final VariantContextWriter writer = builder.build();

    writer.writeHeader(new VCFHeader(VCFUtils.smartMergeHeaders(headers, false), sampleList));

    final MergingIterator<VariantContext> mergingIterator = new MergingIterator<VariantContext>(variantContextComparator, iteratorCollection);
    while (mergingIterator.hasNext()) {
        final VariantContext context = mergingIterator.next();
        writer.add(context);
        progress.record(context.getContig(), context.getStart());
    }

    CloserUtil.close(mergingIterator);
    writer.close();
    return 0;
}
 
Example 14
Source File: DGEMatrix.java    From Drop-seq with MIT License 4 votes vote down vote up
/**
 * Merge DGEs that have different genes in each DGE.  List of cells may overlap, or may not.
 * @return a new DGEMatrix that is the merge of this and other.
 */
private DGEMatrix mergeExpressionForCells(DGEMatrix other) {
	List<String> thisGenes = this.getGenes();
	List<String> otherGenes = other.getGenes();
	Collection<String> geneOverlap = CollectionUtils.intersection(thisGenes, otherGenes);
	if (!geneOverlap.isEmpty()) {
		throw new IllegalArgumentException("The two matrixes have overlapping genes, this should not happen:" +
				geneOverlap.toString());
	}
	List<String> allGenes = new ArrayList<> (thisGenes);
	CollectionUtils.addAll(allGenes, otherGenes);
	Collections.sort(allGenes);
	final ProgressLogger progressLogger = new ProgressLogger(log, 1000, "processed",
			String.format("genes of %d", allGenes.size()));
	// Determine which DGE has more cells.
	final DGEMatrix bigger;
	final DGEMatrix smaller;
	if (this.getCellBarcodes().size() >= other.getCellBarcodes().size()) {
		bigger = this;
		smaller = other;
	} else {
		bigger = other;
		smaller = this;
	}
	// Cells from bigger list go first, and then any that are in the smaller but not the bigger.
	final List<String> allCellBarcodes = new ArrayList<>(bigger.getCellBarcodes());
	// Create a map of the old to new index of columns in the DGE with fewer columns.
	final Map<Integer, Integer> smallerColumnMap = new HashMap<>();
	for (int column = 0; column < smaller.getCellBarcodes().size(); ++column) {
		final String cell = smaller.getCellBarcodes().get(column);
		int newColumn = allCellBarcodes.indexOf(cell);
		if (newColumn == -1) {
			newColumn = allCellBarcodes.size();
			allCellBarcodes.add(cell);
		}
		smallerColumnMap.put(column, newColumn);
	}
	CRSMatrix m = new CRSMatrix(allGenes.size(), allCellBarcodes.size());
	final Set<String> biggerGenesSet = new HashSet<>(bigger.getGenes());
	for (int i = 0; i < allGenes.size(); ++i) {
		final String gene = allGenes.get(i);
		org.la4j.Vector outVec;
		if (biggerGenesSet.contains(gene)) {
			outVec = bigger.expressionMatrix.getRow(bigger.geneMap.get(gene));
			// append zeroes as appropriate
			outVec = outVec.copyOfLength(allCellBarcodes.size());
		} else {
			final double[] primVec = new double[allCellBarcodes.size()];
			final VectorIterator it = smaller.expressionMatrix.getRow(smaller.geneMap.get(gene)).toSparseVector().nonZeroIterator();
			while (it.hasNext()) {
				final double val = it.next();
				final int newIndex = smallerColumnMap.get(it.index());
				primVec[newIndex] = val;
			}
			outVec = SparseVector.fromArray(primVec);
		}
		m.setRow(i, outVec);
		progressLogger.record(gene, i);
	}
	return (new DGEMatrix(allCellBarcodes, allGenes, m));
}
 
Example 15
Source File: BamTagOfTagCounts.java    From Drop-seq with MIT License 4 votes vote down vote up
public TagOfTagResults<String,String> getResults (final File inputBAM, final String primaryTag, final String secondaryTag, final boolean filterPCRDuplicates, final Integer readQuality) {
	TagOfTagResults<String,String> result = new TagOfTagResults<>();
	SamReader reader = SamReaderFactory.makeDefault().open(inputBAM);
	CloseableIterator<SAMRecord> iter = CustomBAMIterators.getReadsInTagOrder(reader, primaryTag);
	CloserUtil.close(reader);
	String currentTag="";
	Set<String> otherTagCollection=new HashSet<>();

	ProgressLogger progress = new ProgressLogger(log);

	while (iter.hasNext()) {
		SAMRecord r = iter.next();
		progress.record(r);
		// skip reads that don't pass filters.
		boolean discardResult=(filterPCRDuplicates && r.getDuplicateReadFlag()) || r.getMappingQuality()<readQuality || r.isSecondaryOrSupplementary();
		// short circuit if read is below the quality to be considered.
		if (discardResult) continue;

		Object d = r.getAttribute(secondaryTag);
		// short circuit if there's no tag for this read.
		if (d==null) continue;

		String data=null;

		if (d instanceof String)
			data=r.getStringAttribute(secondaryTag);
		else if (d instanceof Integer)
			data=Integer.toString(r.getIntegerAttribute(secondaryTag));


		String newTag = r.getStringAttribute(primaryTag);
		if (newTag==null)
			newTag="";

		// if you see a new tag.
		if (!currentTag.equals(newTag)) {
			// write out tag results, if any.
			if (!currentTag.equals("") && otherTagCollection.size()>0)
				result.addEntries(currentTag, otherTagCollection);
			currentTag=newTag;
			otherTagCollection.clear();
			if (!discardResult) otherTagCollection=addTagToCollection (data,otherTagCollection);

		} else // gather stats
		if (!discardResult) otherTagCollection=addTagToCollection (data,otherTagCollection);
	}
	if (otherTagCollection.size()>0)
		result.addEntries(currentTag, otherTagCollection);

	return (result);
}
 
Example 16
Source File: MarkDuplicatesWithMateCigar.java    From picard with MIT License 4 votes vote down vote up
/**
 * Main work method.
 */
protected int doWork() {
    IOUtil.assertInputsAreValid(INPUT);
    IOUtil.assertFileIsWritable(OUTPUT);
    IOUtil.assertFileIsWritable(METRICS_FILE);

    // Open the inputs
    final SamHeaderAndIterator headerAndIterator = openInputs(true);
    final SAMFileHeader header = headerAndIterator.header;

    // Create the output header
    final SAMFileHeader outputHeader = header.clone();
    if (outputHeader.getSortOrder() != SAMFileHeader.SortOrder.coordinate) {
        throw new PicardException("This program requires inputs in coordinate SortOrder");
    }

    COMMENT.forEach(outputHeader::addComment);

    // Since this is one-pass, unlike MarkDuplicates, we cannot only chain together program
    // group records we have seen, we have to assume all of them may be seen.  We can perhaps
    // filter out any program groups which have been referenced previously.
    setPGIdsSeen(outputHeader);
    // Key: previous PG ID on a SAM Record (or null).  Value: New PG ID to replace it.
    final Map<String, String> chainedPgIds = getChainedPgIds(outputHeader);

    // Open the output
    final SAMFileWriter out = new SAMFileWriterFactory().makeSAMOrBAMWriter(outputHeader,
            true,
            OUTPUT);

    // Create the mark duplicate iterator.  The duplicate marking is handled by the iterator, conveniently.
    final MarkDuplicatesWithMateCigarIterator iterator = new MarkDuplicatesWithMateCigarIterator(headerAndIterator.header,
            headerAndIterator.iterator,
            this.opticalDuplicateFinder,
            this.DUPLICATE_SCORING_STRATEGY,
            this.MINIMUM_DISTANCE,
            this.REMOVE_DUPLICATES,
            this.SKIP_PAIRS_WITH_NO_MATE_CIGAR,
            this.MAX_RECORDS_IN_RAM,
            this.BLOCK_SIZE,
            this.TMP_DIR);

    // progress logger!
    final ProgressLogger progress = new ProgressLogger(log, (int) 1e6, "Read");

    // Go through the records
    for (final SAMRecord record : new IterableAdapter<SAMRecord>(iterator)) {
        if (progress.record(record)) {
            iterator.logMemoryStats(log);
        }

        // Update the program record if necessary
        updateProgramRecord(record, chainedPgIds);

        // Write the alignment
        out.addAlignment(record);
    }

    // remember to close the inputs
    iterator.close();

    out.close();

    // For convenience to reference
    final Histogram<Short> opticalDupesByLibraryId = iterator.getOpticalDupesByLibraryId();

    // Log info
    log.info("Processed " + progress.getCount() + " records");
    log.info("Found " + iterator.getNumRecordsWithNoMateCigar() + " records with no mate cigar optional tag.");
    log.info("Marking " + iterator.getNumDuplicates() + " records as duplicates.");
    log.info("Found " + ((long) opticalDupesByLibraryId.getSumOfValues()) + " optical duplicate clusters."); // cast as long due to returning a double

    // Write out the metrics
    finalizeAndWriteMetrics(iterator.getLibraryIdGenerator(), getMetricsFile(), METRICS_FILE);

    return 0;
}
 
Example 17
Source File: DownsampleSam.java    From picard with MIT License 4 votes vote down vote up
@Override
protected int doWork() {
    IOUtil.assertFileIsReadable(INPUT);
    IOUtil.assertFileIsWritable(OUTPUT);

    // Warn the user if they are running with P=1 or P=0 (which are legal, but odd)
    if (PROBABILITY == 1) {
        log.warn("Running DownsampleSam with PROBABILITY=1! This will likely just recreate the input file.");
    }

    if (PROBABILITY == 0) {
        log.warn("Running DownsampleSam with PROBABILITY=0! This will create an empty file.");
    }

    if (RANDOM_SEED == null) {
        RANDOM_SEED = new Random().nextInt();
        log.warn(String.format(
                "Drawing a random seed because RANDOM_SEED was not set. Set RANDOM_SEED to %s to reproduce these results in the future.", RANDOM_SEED));
    }

    final SamReader in = SamReaderFactory.makeDefault().referenceSequence(REFERENCE_SEQUENCE).open(SamInputResource.of(INPUT));
    final SAMFileWriter out = new SAMFileWriterFactory().makeSAMOrBAMWriter(in.getFileHeader(), true, OUTPUT);
    final ProgressLogger progress = new ProgressLogger(log, (int) 1e7, "Wrote");
    final DownsamplingIterator iterator = DownsamplingIteratorFactory.make(in, STRATEGY, PROBABILITY, ACCURACY, RANDOM_SEED);
    final QualityYieldMetricsCollector metricsCollector = new QualityYieldMetricsCollector(true, false, false);

    while (iterator.hasNext()) {
        final SAMRecord rec = iterator.next();
        out.addAlignment(rec);
        if (METRICS_FILE != null) metricsCollector.acceptRecord(rec, null);
        progress.record(rec);
    }

    out.close();
    CloserUtil.close(in);
    final NumberFormat fmt = new DecimalFormat("0.00%");
    log.info("Finished downsampling.");
    log.info("Kept ", iterator.getAcceptedCount(), " out of ", iterator.getSeenCount(), " reads (", fmt.format(iterator.getAcceptedFraction()), ").");

    if (METRICS_FILE != null) {
        final MetricsFile<QualityYieldMetrics, Integer> metricsFile = getMetricsFile();
        metricsCollector.finish();
        metricsCollector.addMetricsToFile(metricsFile);
        metricsFile.write(METRICS_FILE);
    }

    return 0;
}
 
Example 18
Source File: AddOrReplaceReadGroups.java    From picard with MIT License 4 votes vote down vote up
protected int doWork() {
    IOUtil.assertInputIsValid(INPUT);
    IOUtil.assertFileIsWritable(OUTPUT);

    final SamReader in = SamReaderFactory.makeDefault()
        .referenceSequence(REFERENCE_SEQUENCE)
        .open(SamInputResource.of(INPUT));

    // create the read-group we'll be using
    final SAMReadGroupRecord rg = new SAMReadGroupRecord(RGID);
    rg.setLibrary(RGLB);
    rg.setPlatform(RGPL);
    rg.setSample(RGSM);
    rg.setPlatformUnit(RGPU);
    if (RGCN != null) rg.setSequencingCenter(RGCN);
    if (RGDS != null) rg.setDescription(RGDS);
    if (RGDT != null) rg.setRunDate(RGDT);
    if (RGPI != null) rg.setPredictedMedianInsertSize(RGPI);
    if (RGPG != null) rg.setProgramGroup(RGPG);
    if (RGPM != null) rg.setPlatformModel(RGPM);
    if (RGKS != null) rg.setKeySequence(RGKS);
    if (RGFO != null) rg.setFlowOrder(RGFO);

    log.info(String.format("Created read-group ID=%s PL=%s LB=%s SM=%s%n", rg.getId(), rg.getPlatform(), rg.getLibrary(), rg.getSample()));

    // create the new header and output file
    final SAMFileHeader inHeader = in.getFileHeader();
    final SAMFileHeader outHeader = inHeader.clone();
    outHeader.setReadGroups(Collections.singletonList(rg));
    if (SORT_ORDER != null) outHeader.setSortOrder(SORT_ORDER);

    final SAMFileWriter outWriter = new SAMFileWriterFactory().makeSAMOrBAMWriter(outHeader,
            outHeader.getSortOrder() == inHeader.getSortOrder(),
            OUTPUT);

    final ProgressLogger progress = new ProgressLogger(log);
    for (final SAMRecord read : in) {
        read.setAttribute(SAMTag.RG.name(), RGID);
        outWriter.addAlignment(read);
        progress.record(read);
    }

    // cleanup
    CloserUtil.close(in);
    outWriter.close();
    return 0;
}
 
Example 19
Source File: MergeSamFiles.java    From picard with MIT License 4 votes vote down vote up
/** Combines multiple SAM/BAM files into one. */
@Override
protected int doWork() {
    boolean matchedSortOrders = true;
    
    // read interval list if it is defined
    final List<Interval> intervalList = (INTERVALS == null ? null : IntervalList.fromFile(INTERVALS).uniqued().getIntervals() );
    // map reader->iterator used if INTERVALS is defined
    final Map<SamReader, CloseableIterator<SAMRecord> > samReaderToIterator = new HashMap<>(INPUT.size());

    final List<Path> inputPaths = IOUtil.getPaths(INPUT);

    // Open the files for reading and writing
    final List<SamReader> readers = new ArrayList<>();
    final List<SAMFileHeader> headers = new ArrayList<>();
    {
        SAMSequenceDictionary dict = null; // Used to try and reduce redundant SDs in memory

        for (final Path inFile : inputPaths) {
            IOUtil.assertFileIsReadable(inFile);
            final SamReader in = SamReaderFactory.makeDefault().referenceSequence(REFERENCE_SEQUENCE).open(inFile);
            if (INTERVALS != null) {
                if (!in.hasIndex()) {
                    throw new PicardException("Merging with interval but BAM file is not indexed: " + inFile);
                }
                final CloseableIterator<SAMRecord> samIterator = new SamRecordIntervalIteratorFactory().makeSamRecordIntervalIterator(in, intervalList, true);
                samReaderToIterator.put(in, samIterator);
            }

            readers.add(in);
            headers.add(in.getFileHeader());

            // A slightly hackish attempt to keep memory consumption down when merging multiple files with
            // large sequence dictionaries (10,000s of sequences). If the dictionaries are identical, then
            // replace the duplicate copies with a single dictionary to reduce the memory footprint. 
            if (dict == null) {
                dict = in.getFileHeader().getSequenceDictionary();
            } else if (dict.equals(in.getFileHeader().getSequenceDictionary())) {
                in.getFileHeader().setSequenceDictionary(dict);
            }

            matchedSortOrders = matchedSortOrders && in.getFileHeader().getSortOrder() == SORT_ORDER;
        }
    }

    // If all the input sort orders match the output sort order then just merge them and
    // write on the fly, otherwise setup to merge and sort before writing out the final file
    IOUtil.assertFileIsWritable(OUTPUT);
    final boolean presorted;
    final SAMFileHeader.SortOrder headerMergerSortOrder;
    final boolean mergingSamRecordIteratorAssumeSorted;

    if (matchedSortOrders || SORT_ORDER == SAMFileHeader.SortOrder.unsorted || ASSUME_SORTED || INTERVALS != null ) {
        log.info("Input files are in same order as output so sorting to temp directory is not needed.");
        headerMergerSortOrder = SORT_ORDER;
        mergingSamRecordIteratorAssumeSorted = ASSUME_SORTED;
        presorted = true;
    } else {
        log.info("Sorting input files using temp directory " + TMP_DIR);
        headerMergerSortOrder = SAMFileHeader.SortOrder.unsorted;
        mergingSamRecordIteratorAssumeSorted = false;
        presorted = false;
    }
    final SamFileHeaderMerger headerMerger = new SamFileHeaderMerger(headerMergerSortOrder, headers, MERGE_SEQUENCE_DICTIONARIES);
    final MergingSamRecordIterator iterator;
    // no interval defined, get an iterator for the whole bam
    if (intervalList == null) {
        iterator = new MergingSamRecordIterator(headerMerger, readers, mergingSamRecordIteratorAssumeSorted);
    } else {
        // show warning related to https://github.com/broadinstitute/picard/pull/314/files
        log.info("Warning: merged bams from different interval lists may contain the same read in both files");
        iterator = new MergingSamRecordIterator(headerMerger, samReaderToIterator, true);
    }
    final SAMFileHeader header = headerMerger.getMergedHeader();
    for (final String comment : COMMENT) {
        header.addComment(comment);
    }
    header.setSortOrder(SORT_ORDER);
    final SAMFileWriterFactory samFileWriterFactory = new SAMFileWriterFactory();
    if (USE_THREADING) {
        samFileWriterFactory.setUseAsyncIo(true);
    }
    final SAMFileWriter out = samFileWriterFactory.makeSAMOrBAMWriter(header, presorted, OUTPUT);

    // Lastly loop through and write out the records
    final ProgressLogger progress = new ProgressLogger(log, PROGRESS_INTERVAL);
    while (iterator.hasNext()) {
        final SAMRecord record = iterator.next();
        out.addAlignment(record);
        progress.record(record);
    }

    log.info("Finished reading inputs.");
    for(final CloseableIterator<SAMRecord> iter : samReaderToIterator.values())  CloserUtil.close(iter);
    CloserUtil.close(readers);
    out.close();
    return 0;
}
 
Example 20
Source File: OpticalDuplicateFinder.java    From picard with MIT License 4 votes vote down vote up
/**
 * Compute the optical duplicates correctly in the case where the duplicate group could end up with transitive optical duplicates
 */
private boolean[] getOpticalDuplicatesFlagWithGraph(List<? extends PhysicalLocation> list, PhysicalLocation keeper, boolean[] opticalDuplicateFlags, Log log, ProgressLogger progressLoggerForKeeper, ProgressLogger progressLoggerForRest, boolean logProgress) {
    // Make a graph where the edges are reads that lie within the optical duplicate pixel distance from each other,
    // we will then use the union-find algorithm to cluster the graph and find optical duplicate groups
    final GraphUtils.Graph<Integer> opticalDistanceRelationGraph = new GraphUtils.Graph<>();
    if (logProgress) {
        log.debug("Building adjacency graph for duplicate group");
    }

    final Map<Integer, List<Integer>> tileRGmap = new HashMap<>();

    int keeperIndex = -1;
    for (int i = 0; i < list.size(); i++) {
        PhysicalLocation currentLoc = list.get(i);
        if (currentLoc == keeper) {
            keeperIndex = i;
        }
        if (currentLoc.hasLocation()) {
            final int key = ((int) currentLoc.getReadGroup() << 16) + currentLoc.getTile();

            if (tileRGmap.containsKey(key)) {
                tileRGmap.get(key).add(i);
            } else {
                final List<Integer> pLocation = new ArrayList<>();
                pLocation.add(i);
                tileRGmap.put(key, pLocation);
            }
        }
        opticalDistanceRelationGraph.addNode(i);
    }

    // Since because finding adjacent optical duplicates is an O(n^2) operation, we can subdivide the input into its
    // readgroups in order to reduce the amount of redundant checks across readgroups between reads.
    for (List<Integer> tileGroup : tileRGmap.values()) {
        if (tileGroup.size() > 1) {
            fillGraphFromAGroup(list, tileGroup, logProgress, progressLoggerForKeeper,  this.opticalDuplicatePixelDistance, opticalDistanceRelationGraph);
        }
    }

    if (logProgress) {
        log.debug("Finished building adjacency graph for duplicate group, moving onto clustering");
    }

    // Keep a map of the reads and their cluster assignments
    final Map<Integer, Integer> opticalDuplicateClusterMap = opticalDistanceRelationGraph.cluster();
    final Map<Integer, Integer> clusterToRepresentativeRead = new HashMap<>();
    Integer keeperCluster = null;

    // Specially mark the keeper as specifically not a duplicate if it exists
    if (keeperIndex >= 0) {
        clusterToRepresentativeRead.put(opticalDuplicateClusterMap.get(keeperIndex), keeperIndex);
        keeperCluster = opticalDuplicateClusterMap.get(keeperIndex);
    }

    for (final Map.Entry<Integer, Integer> entry : opticalDuplicateClusterMap.entrySet()) {
        // logging here for same reason as above
        final int recordIndex = entry.getKey();
        final int recordAssignedCluster = entry.getValue();
        if (logProgress) {
            progressLoggerForRest.record(String.format("%d", list.get(recordIndex).getReadGroup()), list.get(recordIndex).getX());
        }

        // If its not the first read we've seen for this cluster, mark it as an optical duplicate
        if (clusterToRepresentativeRead.containsKey(recordAssignedCluster) && recordIndex != keeperIndex) {
            final PhysicalLocation representativeLoc = list.get(clusterToRepresentativeRead.get(recordAssignedCluster));
            final PhysicalLocation currentRecordLoc = list.get(recordIndex);

            // If not in the keeper cluster, then keep the minX -> minY valued duplicate (note the tile must be equal for reads to cluster together)
            if (!(keeperIndex >= 0 && recordAssignedCluster == keeperCluster) && // checking we don't accidentally set the keeper as an optical duplicate
                    (currentRecordLoc.getX() < representativeLoc.getX() || (currentRecordLoc.getX() == representativeLoc.getX() && currentRecordLoc.getY() < representativeLoc.getY()))) {
                // Mark the old min as an optical duplicate, and save the new min
                opticalDuplicateFlags[clusterToRepresentativeRead.get(recordAssignedCluster)] = true;
                clusterToRepresentativeRead.put(recordAssignedCluster, recordIndex);
            } else {
                // If a smaller read has already been visited, mark the test read as an optical duplicate
                opticalDuplicateFlags[recordIndex] = true;
            }
        } else {
            clusterToRepresentativeRead.put(recordAssignedCluster, recordIndex);
        }
    }

    return opticalDuplicateFlags;
}