Phen2Gene is a phenotype-driven gene prioritization tool, that takes HPO (Human Phenotype Ontology) IDs as inputs, searches and prioritizes candidate causal disease genes. It is distributed under the MIT License by Wang Genomics Lab. Additionally we have provided a web server and an associated RESTful API service for running Phen2Gene. Finally, a mobile app for Phen2Gene and several other genetic diagnostic tools from our lab is being tested and will be available soon.
If you do not wish to use Anaconda, simply install the packages in the file environment.yml
using pip
. If you use conda
, some packages may not properly install without updating conda
using conda update conda
first.
First, install Miniconda, a minimal installation of Anaconda, which is much smaller and has a faster installation. Note that this version is meant for Linux below, macOS and Windows have a different script:
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Go through all the prompts (installation in $HOME
is recommended).
After Anaconda is installed successfully, simply run:
git clone https://github.com/WGLab/Phen2Gene.git
cd Phen2Gene
conda env create -f environment.yml
conda activate phen2gene
bash setup.sh
This software can be used in one of three scenarios:
Input files to Phen2Gene should contain HPO IDs, separated by UNIX-recognized new line characters (i.e., \n
).
Alternatively you can use a space separated list of HPO IDs on the command line.
provided HPO_sample.txt
filepython phen2gene.py -f example/HPO_sample.txt -out out/prioritizedgenelist
python phen2gene.py -f example/HPO_sample.txt -out out/prioritizedgenelist -l example/1000genetest.txt
Use Skewness and Information Content
-w sk
uses a skewness-based weighting of genes for each HPO term (default, and recommended)-w w
and -w ic
do not use skew, but utilize information content in the tree structure (slightly worse performance)-w u
is unweightedpython phen2gene.py -f example/HPO_sample.txt -w sk -out out/prioritizedgenelist
python phen2gene.py -f example/HPO_sample.txt -v -out out/prioritizedgenelist
python phen2gene.py -m HP:0000021 HP:0000027 HP:0030905 HP:0010628 -out out/prioritizedgenelist
python phen2gene.py -f example/HPO_sample.txt -d full_path_to_H2GKB.zip_extraction_folder -out out/prioritizedgenelist
Examples of how to use the Web Server and the RESTful API can be found in the Docs.
Please use the Phen2Gene issues page if you have any questions!
In order, run:
bash setup.sh # You can skip it if you ran it in the installation.
bash runtest.sh
If you only want the benchmark data and nothing else:
bash getbenchmark.sh /directory/to/download/to
The figures are in the folder figures
.
After changing the code example/ANKRD11example.sh
so the ANNOVAR db is built where you would like it, simply run:
bash example/ANKRD11example.sh
Going through the code in example/ANKRD11example.sh
, first one downloads a list of candidate variants from the article referenced in the manuscript where the patient has KBG syndrome.
Then, we annotate with ANNOVAR to retrieve gene annotations for these variants, functional consequence information (exonic, intronic, nonsynonymous), amino acid change information, and population frequency.
We next filter out common variants (>1% in gnomAD 2.1.1) and use Phen2Gene to rank the candidate genes based on HPO terms.
Combining this information with the variants, we can re-rank Phen2Gene's candidate list as in the script filterbyannovar.py
and discover that the variant for the causal gene ANKRD11 is now ranked number 1 after being ranked number 2 by HPO term. The number 1 ranked gene by HPO, VPS13B, is filtered out because the only candidate variant (8-100133706-T-G) has an extremely high allele frequency in gnomAD(74%!).