YaYaGen - Yet Another Yara Rule Generator

License Language: Python3



YaYaGen is in Las Vegas!

"Looking for the perfect signature: an automatic YARA rules generation algorithm in the AI-era"

Contact Andrea for some free stickers of your favourite grandma.

Getting Started

YaYaGen is an automatic procedure, that starts from a set of Koodous reports (example), either identified as a malware family, or by any other mean, and eventually produces a signature in the form of a YARA rule that can be seamlessly used in Koodous. YaYaGen analyzes the reports of the target applications, extract the analysis attributes, and identifies an optimal attribute subsets that are able to match all the targets; moreover, thanks to a heuristic measure, the generated signature has a limited risk of detecting false positive in the future, yet it is general enough to catch future threats.

The algorithm is originally described in "Countering Android Malware: a Scalable Semi-Supervised Approach for Family-Signature Generation" (DOI: 10.1109/ACCESS.2018.2874502).


YaYaGen requires Python 3.4 or greater.

Clone the repository

git clone https://github.com/jimmy-sonny/YaYaGen


Installing PyJq requires automake and libtool

For OSX users:

brew install automake
brew install libtool

Python packages

To install the required python3 packages:

pip3 install -r requirements.txt

Set your VT api (optional)

export VTAPI=your_vt_api

How to use YaYaGen

__  __   __  __     _____
\ \/ /__ \ \/ /__ _/ ___/__ ___   YaYaGen -- Yet Another Yara Rule Generator
 \  / _ `/\  / _ `/ (_ / -_) _ \  (!) v0.5_summer18
 /_/\_,_/ /_/\_,_/\___/\__/_//_/  by Andrea Marcelli & Giovanni Squillero

usage: YaYaGen [-h] [-d] [-ndb] [-dry] [-a ALGORITHM] [-opt OPTIMIZER]
               [-u URL] [-dir DIRECTORY] [-f FILTER] [-o OUTPUTDIR]
               [-name RULENAME]
               [sha256 [sha256 ...]]

Yet another YARA rule Generator

positional arguments:
  sha256                sha256 APK list

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           log level debug
  -ndb, --no-db         disable DB
  -dry, --dryrun        parse inputs and exit
  -a ALGORITHM, --algorithm ALGORITHM
                        [greedy, clot]
  -opt OPTIMIZER, --optimizer OPTIMIZER
                        [basic, evo]
  -u URL, --url URL     koodous URL
  -dir DIRECTORY, --directory DIRECTORY
                        directory with Koodous reports
  -f FILTER, --filter FILTER
                        filter reports in input (one sha256 or filename per
  -o OUTPUTDIR, --outputdir OUTPUTDIR
                        save generated rules to outputdir
  -name RULENAME, --rulename RULENAME
                        YARA rule name

YaYaGen accepts Koodous JSON reports both by specifying a directory through the -d option. Alternatively, it is possible to directly download them using a Koodous search APK URL (e.g., https://koodous.com/apks?search=tag:bankbot%20AND%20date:%3E2018-06-10). Internally reports are stored in an intermediate representation (a set of Python tuples) and cached in a SQLite DB (reports.sqlite3) created locally. To invalidate the cache, use the -ndb option.

YaYaGen builds each Yara rule by selecting a suitable set of clauses, then picks a subset of them of variable size to build an optimal family signature. The current implementation provides two possible algorithms:

and two possibile rule optimization strategies:

YaYaGen uses several configuration files to set various options. The configuration.json allows to enable Cuckoo support, specify Permission and Intent filters list, keywords and values files. keywords.json is used at the preprocessing of the Koodous json reports to select which literal consider during the rule generation process, while values.json is used to specify the weight of each literal.

Since urls and ip addresses are very effective in detecting malicious samples, they are filtered and checked for maliciousness before being included in the set of literals used for the rule generation. The module in charge of preprocessing them is the url_checker.py, which firstly filters common urls using the Alexa Top 1 million list (alexa), and then uses Virus Total API to check the domain for malicious traffic. Results are cached in a SQLite DB (detections.sqlite3) created locally.


Generate one YARA rule for each sample in the _sample_analysis_json directory:

./yyg.py -d -dry -dir _sample_analysis_json

Generate a YARA rule to cover all the samples in input, using clot algorithm and basic optimizer:

./yyg.py -d -a clot -opt basic -dir _sample_analysis_json

Generate a YARA rule to match all the samples from the Koodous search query: "tag:bankbot AND date:>2018-06-10"

./yyg.py -d -name bankbot -o bankbot_rule --url https://koodous.com/apks?search=tag:bankbot%20AND%20date:%3E2018-06-10

Generate a YARA rule to match the two applications in input. (APK reports are downloaded from Koodous).

./yyg.py accd05c00951ef568594efebd5c30bdce2e63cee9b2cdd88cb705776e0a4ca70 e6aba7629608a525b020f4e76e4694d6d478dd9561d934813004b6903d66e44c

Rule Quality

The score of a rule is inversely related to its generality and it is defined as the minimum weight (i.e. most generic) among its clauses, on the other hand the weight of a clause is calculated as the sum of the weights of its literals. The higher the score, the more a rule will be specific and less susceptible to generate false positives. The lower the score, the more a rule will be able to generalize, while more prone to unwanted detections. In order to find a balance between the two cases, and build effective rules, we introduce a double threshold Tmin and Tmax, where the lowest is the minimum score that a rule needs to be valid, and the highest is used in the optimization process to avoid producing overly-specific rulesets.

For more information, please refer to the article "Countering Android Malware: a Scalable Semi-Supervised Approach for Family-Signature Generation" (DOI: [N/A](), article under review)).


In order to experiment the rules creation with different settings, the weight of each literal and the two thresholds can be configured in values.json.

Next steps

YaYaGen is still under active development. Several extensions will be available soon.


Feel free to contact Andrea Marcelli for any ideas, improvements, and questions.


twitter: @S0nn1


Copyright © 2017 Andrea Marcelli & Giovanni Squillero.

Thanks to the whole Hispasec team, @plutec_net, @entdark_, and @plusvic for their support and insightful comments.

YaYaGen is licensed under the 2-Clause BSD License (BSD-2-Clause).