GATK4 development of the previously license-protected part of the toolkit. The contents of this repo will be merged into broadinstitute/gatk in the near future.
This README is aimed at developers. For user information, please see the GATK 4 forum
Please refer to the GATK 4 public repo README for general guidelines and how to setup your development environment.
R 3.1.3 see additional requirements below: R package requirements
(Developers) Gradle 2.13 is needed for building the GATK. We recommend using the
./gradlew script which will
download and use an appropriate gradle version automatically.
(Developers) git lfs 1.1.0 (or greater) is needed for testing GATK-Protected builds. It is needed to download large files for the complete test suite. Run
git lfs install after downloading, followed by
git lfs pull to download the large files. The download is ~500 MB.
R packages can be installed using the install_R_packages.R script inside the scripts directory.
To do a fast build that lets you run GATK tools from within a git clone locally (but not on a cluster), run:
To do a slower build that lets you run GATK tools from within a git clone both locally and on a cluster, run:
To build a fully-packaged GATK jar that can be distributed and includes all dependencies needed for running tools locally, run:
build/libswith a name like
To build a fully-packaged GATK jar that can be distributed and includes all dependencies needed for running spark tools on a cluster, run:
build/libswith a name like
To remove previous builds, run:
The standard way to run GATK4 tools is via the
gatk-launch wrapper script located in the root directory of a clone of this repository.
gatk-launchcan be run:
gatk-launchscript within the same directory as fully-packaged GATK jars produced by
GATK_SPARK_JARcan be defined, and contain the paths to the fully-packaged GATK jars produced by
For help on using
gatk-launch itself, run
To print a list of available tools, run
To print help for a particular tool, run
./gatk-launch ToolName --help.
To run a non-Spark tool, or to run a Spark tool locally, the syntax is:
./gatk-launch ToolName toolArguments
./gatk-launch PrintReads -I input.bam -O output.bam ./gatk-launch PrintReadsSpark -I input.bam -O output.bam
To run a Spark tool on a Spark cluster, the syntax is:
./gatk-launch ToolName toolArguments -- --sparkRunner SPARK --sparkMaster <master_url> additionalSparkArguments
./gatk-launch PrintReadsSpark -I hdfs://path/to/input.bam -O hdfs://path/to/output.bam \ -- \ --sparkRunner SPARK --sparkMaster <master_url> \ --num-executors 5 --executor-cores 2 --executor-memory 4g \ --conf spark.yarn.executor.memoryOverhead=600
To run a Spark tool on Google Cloud Dataproc, the syntax is:
./gatk-launch ToolName toolArguments -- --sparkRunner GCS --cluster myGCSCluster additionalSparkArguments
./gatk-launch PrintReadsSpark \ -I gs://my-gcs-bucket/path/to/input.bam \ -O gs://my-gcs-bucket/path/to/output.bam \ -- \ --sparkRunner GCS --cluster myGCSCluster \ --num-executors 5 --executor-cores 2 --executor-memory 4g \ --conf spark.yarn.executor.memoryOverhead=600
See the GATK4 public README for full instructions on using
gatk-launch to run tools on a Spark/Dataproc cluster.
To run the tests, run
git lfsmust be installed and set up as described in the "Requirements" section above in order for all tests to pass.
To run a subset of tests, use gradle's test filtering (see gradle doc), e.g.,
./gradlew test -Dtest.single=SomeSpecificTestClass
./gradlew test --tests *SomeSpecificTestClass
./gradlew test --tests all.in.specific.package*
./gradlew test --tests *SomeTest.someSpecificTestMethod
See the GATK4 public README for further information on running tests.
This can be found here