GATK4 development of the previously license-protected part of the toolkit. The contents of this repo will be merged into broadinstitute/gatk in the near future.
This README is aimed at developers. For user information, please see the GATK 4 forum
Please refer to the GATK 4 public repo README for general guidelines and how to setup your development environment.
R 3.1.3 see additional requirements below: R package requirements
Java 8
(Developers) Gradle 2.13 is needed for building the GATK. We recommend using the ./gradlew
script which will
download and use an appropriate gradle version automatically.
(Developers) git lfs 1.1.0 (or greater) is needed for testing GATK-Protected builds. It is needed to download large files for the complete test suite. Run git lfs install
after downloading, followed by git lfs pull
to download the large files. The download is ~500 MB.
R packages can be installed using the install_R_packages.R script inside the scripts directory.
To do a fast build that lets you run GATK tools from within a git clone locally (but not on a cluster), run:
./gradlew installDist
To do a slower build that lets you run GATK tools from within a git clone both locally and on a cluster, run:
./gradlew installAll
To build a fully-packaged GATK jar that can be distributed and includes all dependencies needed for running tools locally, run:
./gradlew localJar
build/libs
with a name like gatk-protected-package-VERSION-local.jar
To build a fully-packaged GATK jar that can be distributed and includes all dependencies needed for running spark tools on a cluster, run:
./gradlew sparkJar
build/libs
with a name like gatk-protected-package-VERSION-spark.jar
To remove previous builds, run:
./gradlew clean
The standard way to run GATK4 tools is via the gatk-launch
wrapper script located in the root directory of a clone of this repository.
gatk-launch
can be run:
gatk-launch
script within the same directory as fully-packaged GATK jars produced by ./gradlew localJar
and ./gradlew sparkJar
GATK_LOCAL_JAR
and GATK_SPARK_JAR
can be defined, and contain the paths to the fully-packaged GATK jars produced by ./gradlew localJar
and ./gradlew sparkJar
For help on using gatk-launch
itself, run ./gatk-launch --help
To print a list of available tools, run ./gatk-launch --list
.
To print help for a particular tool, run ./gatk-launch ToolName --help
.
To run a non-Spark tool, or to run a Spark tool locally, the syntax is:
./gatk-launch ToolName toolArguments
Examples:
./gatk-launch PrintReads -I input.bam -O output.bam
./gatk-launch PrintReadsSpark -I input.bam -O output.bam
To run a Spark tool on a Spark cluster, the syntax is:
./gatk-launch ToolName toolArguments -- --sparkRunner SPARK --sparkMaster <master_url> additionalSparkArguments
Example:
./gatk-launch PrintReadsSpark -I hdfs://path/to/input.bam -O hdfs://path/to/output.bam \
-- \
--sparkRunner SPARK --sparkMaster <master_url> \
--num-executors 5 --executor-cores 2 --executor-memory 4g \
--conf spark.yarn.executor.memoryOverhead=600
To run a Spark tool on Google Cloud Dataproc, the syntax is:
./gatk-launch ToolName toolArguments -- --sparkRunner GCS --cluster myGCSCluster additionalSparkArguments
Example:
./gatk-launch PrintReadsSpark \
-I gs://my-gcs-bucket/path/to/input.bam \
-O gs://my-gcs-bucket/path/to/output.bam \
-- \
--sparkRunner GCS --cluster myGCSCluster \
--num-executors 5 --executor-cores 2 --executor-memory 4g \
--conf spark.yarn.executor.memoryOverhead=600
See the GATK4 public README for full instructions on using gatk-launch
to run tools on a Spark/Dataproc cluster.
To run the tests, run ./gradlew test
.
build/reports/tests/test/index.html
. git lfs
must be installed and set up as described in the "Requirements" section above
in order for all tests to pass.To run a subset of tests, use gradle's test filtering (see gradle doc), e.g.,
./gradlew test -Dtest.single=SomeSpecificTestClass
./gradlew test --tests *SomeSpecificTestClass
./gradlew test --tests all.in.specific.package*
./gradlew test --tests *SomeTest.someSpecificTestMethod
See the GATK4 public README for further information on running tests.
This can be found here