Java Code Examples for htsjdk.samtools.util.SortingCollection#Codec

The following examples show how to use htsjdk.samtools.util.SortingCollection#Codec . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: SortingIteratorFactory.java    From Drop-seq with MIT License 6 votes vote down vote up
/**
 *
 * @param componentType Required because of Java generic syntax limitations.
 * @param underlyingIterator All records are pulled from this iterator, which is then closed if closeable.
 * @param comparator Defines sort order.
 * @param codec For spilling to temp files
 * @param maxRecordsInRam
 * @param progressLogger Pass null if not interested in logging.
 * @return An iterator in the order defined by comparator, that will produce all the records from underlyingIterator.
 */
public static <T> CloseableIterator<T> create(final Class<T> componentType,
                                              final Iterator<T> underlyingIterator,
                                              final Comparator<T> comparator,
                                              final SortingCollection.Codec<T> codec,
                                              final int maxRecordsInRam,
                                              final ProgressCallback progressLogger) {

    SortingCollection<T> sortingCollection =
            SortingCollection.newInstance(componentType, codec, comparator, maxRecordsInRam);

    while (underlyingIterator.hasNext()) {
        final T rec = underlyingIterator.next();
        if (progressLogger != null)
progressLogger.logProgress(rec);
        sortingCollection.add(rec);
    }
    CloseableIterator<T> ret = sortingCollection.iterator();
    CloserUtil.close(underlyingIterator);
    return ret;
}
 
Example 2
Source File: IlluminaBasecallsConverter.java    From picard with MIT License 6 votes vote down vote up
/**
 * @param basecallsDir             Where to read basecalls from.
 * @param lane                     What lane to process.
 * @param readStructure            How to interpret each cluster.
 * @param barcodeRecordWriterMap   Map from barcode to CLUSTER_OUTPUT_RECORD writer.  If demultiplex is false, must contain
 *                                 one writer stored with key=null.
 * @param demultiplex              If true, output is split by barcode, otherwise all are written to the same output stream.
 * @param maxReadsInRamPerTile     Configures number of reads each tile will store in RAM before spilling to disk.
 * @param tmpDirs                  For SortingCollection spilling.
 * @param numProcessors            Controls number of threads.  If <= 0, the number of threads allocated is
 *                                 available cores - numProcessors.
 * @param forceGc                  Force explicit GC periodically.  This is good for causing memory maps to be released.
 * @param firstTile                (For debugging) If non-null, start processing at this tile.
 * @param tileLimit                (For debugging) If non-null, process no more than this many tiles.
 * @param outputRecordComparator   For sorting output records within a single tile.
 * @param codecPrototype           For spilling output records to disk.
 * @param outputRecordClass        Inconveniently needed to create SortingCollections.
 * @param includeNonPfReads        If true, will include ALL reads (including those which do not have PF set)
 * @param ignoreUnexpectedBarcodes If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap,
 *                                 otherwise will throw an exception
 */
public IlluminaBasecallsConverter(final File basecallsDir, final int lane, final ReadStructure readStructure,
                                  final Map<String, ? extends ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                  final boolean demultiplex,
                                  final int maxReadsInRamPerTile,
                                  final List<File> tmpDirs,
                                  final int numProcessors, final boolean forceGc,
                                  final Integer firstTile, final Integer tileLimit,
                                  final Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                                  final SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                                  final Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                                  final BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                  final boolean applyEamssFiltering,
                                  final boolean includeNonPfReads,
                                  final boolean ignoreUnexpectedBarcodes
) {
    this(basecallsDir, null, lane, readStructure,
            barcodeRecordWriterMap, demultiplex, maxReadsInRamPerTile,
            tmpDirs, numProcessors, forceGc, firstTile, tileLimit,
            outputRecordComparator, codecPrototype, outputRecordClass,
            bclQualityEvaluationStrategy, applyEamssFiltering,
            includeNonPfReads, ignoreUnexpectedBarcodes);
}
 
Example 3
Source File: BasecallsConverter.java    From picard with MIT License 5 votes vote down vote up
/**
 * @param barcodeRecordWriterMap   Map from barcode to CLUSTER_OUTPUT_RECORD writer.  If demultiplex is false, must contain
 *                                 one writer stored with key=null.
 * @param demultiplex              If true, output is split by barcode, otherwise all are written to the same output stream.
 * @param maxReadsInRamPerTile     Configures number of reads each tile will store in RAM before spilling to disk.
 * @param tmpDirs                  For SortingCollection spilling.
 * @param numProcessors            Controls number of threads.  If <= 0, the number of threads allocated is
 *                                 available cores - numProcessors.
 * @param outputRecordComparator   For sorting output records within a single tile.
 * @param codecPrototype           For spilling output records to disk.
 * @param outputRecordClass        Inconveniently needed to create SortingCollections.
 * @param ignoreUnexpectedBarcodes If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap,
 */
BasecallsConverter(final Map<String, ? extends ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                   final int maxReadsInRamPerTile,
                   final List<File> tmpDirs,
                   final SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                   final boolean ignoreUnexpectedBarcodes,
                   final boolean demultiplex,
                   final Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                   final BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                   final Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                   final int numProcessors,
                   final IlluminaDataProviderFactory factory) {

    this.barcodeRecordWriterMap = barcodeRecordWriterMap;
    this.maxReadsInRamPerTile = maxReadsInRamPerTile;
    this.tmpDirs = tmpDirs;
    this.codecPrototype = codecPrototype;
    this.ignoreUnexpectedBarcodes = ignoreUnexpectedBarcodes;
    this.demultiplex = demultiplex;
    this.outputRecordComparator = outputRecordComparator;
    this.bclQualityEvaluationStrategy = bclQualityEvaluationStrategy;
    this.outputRecordClass = outputRecordClass;
    this.factory = factory;


    if (numProcessors == 0) {
        this.numThreads = Runtime.getRuntime().availableProcessors();
    } else if (numProcessors < 0) {
        this.numThreads = Runtime.getRuntime().availableProcessors() + numProcessors;
    } else {
        this.numThreads = numProcessors;
    }
}
 
Example 4
Source File: IlluminaBasecallsConverter.java    From picard with MIT License 4 votes vote down vote up
/**
 * @param basecallsDir             Where to read basecalls from.
 * @param barcodesDir              Where to read barcodes from (optional; use basecallsDir if not specified).
 * @param lane                     What lane to process.
 * @param readStructure            How to interpret each cluster.
 * @param barcodeRecordWriterMap   Map from barcode to CLUSTER_OUTPUT_RECORD writer.  If demultiplex is false, must contain
 *                                 one writer stored with key=null.
 * @param demultiplex              If true, output is split by barcode, otherwise all are written to the same output stream.
 * @param maxReadsInRamPerTile     Configures number of reads each tile will store in RAM before spilling to disk.
 * @param tmpDirs                  For SortingCollection spilling.
 * @param numProcessors            Controls number of threads.  If <= 0, the number of threads allocated is
 *                                 available cores - numProcessors.
 * @param forceGc                  Force explicit GC periodically.  This is good for causing memory maps to be released.
 * @param firstTile                (For debugging) If non-null, start processing at this tile.
 * @param tileLimit                (For debugging) If non-null, process no more than this many tiles.
 * @param outputRecordComparator   For sorting output records within a single tile.
 * @param codecPrototype           For spilling output records to disk.
 * @param outputRecordClass        Inconveniently needed to create SortingCollections.
 * @param includeNonPfReads        If true, will include ALL reads (including those which do not have PF set)
 * @param ignoreUnexpectedBarcodes If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap,
 *                                 otherwise will throw an exception
 */
public IlluminaBasecallsConverter(final File basecallsDir, final File barcodesDir, final int lane,
                                  final ReadStructure readStructure,
                                  final Map<String, ? extends ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                  final boolean demultiplex,
                                  final int maxReadsInRamPerTile,
                                  final List<File> tmpDirs, final int numProcessors,
                                  final boolean forceGc, final Integer firstTile,
                                  final Integer tileLimit,
                                  final Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                                  final SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                                  final Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                                  final BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                  final boolean applyEamssFiltering, final boolean includeNonPfReads,
                                  final boolean ignoreUnexpectedBarcodes
) {
    super(barcodeRecordWriterMap, maxReadsInRamPerTile, tmpDirs, codecPrototype, ignoreUnexpectedBarcodes,
            demultiplex, outputRecordComparator, bclQualityEvaluationStrategy, outputRecordClass,
            numProcessors,
            new IlluminaDataProviderFactory(basecallsDir, barcodesDir, lane, readStructure,
                    bclQualityEvaluationStrategy, getDataTypesFromReadStructure(readStructure, demultiplex)));
    this.includeNonPfReads = includeNonPfReads;
    this.tiles = factory.getAvailableTiles();
    // Since the first non-fixed part of the read name is the tile number, without preceding zeroes,
    // and the output is sorted by read name, process the tiles in this order.
    tiles.sort(TILE_NUMBER_COMPARATOR);
    setTileLimits(firstTile, tileLimit);

    this.numThreads = Math.max(1, Math.min(this.numThreads, tiles.size()));
    // If we're forcing garbage collection, collect every 5 minutes in a daemon thread.
    if (forceGc) {
        final Timer gcTimer = new Timer(true);
        final long delay = 5 * 1000 * 60;
        gcTimerTask = new TimerTask() {
            @Override
            public void run() {
                log.info("Before explicit GC, Runtime.totalMemory()=" + Runtime.getRuntime().totalMemory());
                System.gc();
                System.runFinalization();
                log.info("After explicit GC, Runtime.totalMemory()=" + Runtime.getRuntime().totalMemory());
            }
        };
        gcTimer.scheduleAtFixedRate(gcTimerTask, delay, delay);
    } else {
        gcTimerTask = null;
    }

    this.factory.setApplyEamssFiltering(applyEamssFiltering);

}
 
Example 5
Source File: IlluminaBasecallsToSam.java    From picard with MIT License 4 votes vote down vote up
@Override
public SortingCollection.Codec<SAMRecordsForCluster> clone() {
    return new Codec(numRecords, bamCodec.clone());
}
 
Example 6
Source File: IlluminaBasecallsToFastq.java    From picard with MIT License 4 votes vote down vote up
@Override
public SortingCollection.Codec<FastqRecordsForCluster> clone() {
    return new FastqRecordsForClusterCodec(numTemplates, numSampleBarcodes, numMolecularBarcodes);
}
 
Example 7
Source File: EstimateLibraryComplexity.java    From picard with MIT License 4 votes vote down vote up
public static SortingCollection.Codec<PairedReadSequence> getCodec() {
    return new PairedReadCodec();
}
 
Example 8
Source File: EstimateLibraryComplexity.java    From picard with MIT License 4 votes vote down vote up
@Override
public SortingCollection.Codec<PairedReadSequence> clone() { return new PairedReadCodec(); }
 
Example 9
Source File: EstimateLibraryComplexity.java    From picard with MIT License 4 votes vote down vote up
@Override
public SortingCollection.Codec<PairedReadSequence> clone() { return new PairedReadWithBarcodesCodec(); }
 
Example 10
Source File: ReadEndsForMarkDuplicatesWithBarcodesCodec.java    From picard with MIT License 4 votes vote down vote up
@Override
public SortingCollection.Codec<ReadEndsForMarkDuplicates> clone() {
    return new ReadEndsForMarkDuplicatesWithBarcodesCodec();
}
 
Example 11
Source File: ReadEndsForMarkDuplicatesCodec.java    From picard with MIT License 4 votes vote down vote up
public SortingCollection.Codec<ReadEndsForMarkDuplicates> clone() {
    return new ReadEndsForMarkDuplicatesCodec();
}
 
Example 12
Source File: RepresentativeReadIndexerCodec.java    From picard with MIT License 4 votes vote down vote up
public SortingCollection.Codec<RepresentativeReadIndexer> clone() {
    return new RepresentativeReadIndexerCodec();
}