Jakstab is an Abstract Interpretation-based, integrated disassembly and static analysis framework for designing analyses on executables and recovering reliable control flow graphs. It is designed to be adaptable to multiple hardware platforms using customized instruction decoding and processor specifications. It is written in Java, and in its current state supports x86 processors and 32-bit Windows PE or Linux ELF executables.
Jakstab translates machine code to a low level intermediate language on the fly as it performs data flow analysis on the growing control flow graph. Data flow information is used to resolve branch targets and discover new code locations. Other analyses can either be implemented in Jakstab to run together with the main control flow reconstruction to improve precision of the disassembly, or they can work on the resulting preprocessed control flow graph.
The most detailed description of the entire system so far is contained in Johannes Kinder`s dissertation:
Jakstab is invoked via the command line, it comes with both a Windows and a Unix shell script for setting the correct classpath. The package contains a set of examples for unit testing, you can try it out on those by running
jakstab -m input/helloworld.exe
for a default Bounded Address Tracking run on the helloworld executable, or by running
jakstab -m input/jumptable.exe --cpa xfi
for analyzing a jumptable example with Bounded Address Tracking, forward expression substitution, and interval analysis. It is still a research prototype, so all interfaces are likely to change with new versions without further notice. Documentation is still sparse, but will hopefully improve over time.
After finishing analysis, Jakstab creates the following files:
filename_jak.asm- A disassembly with all reachable instructions
filename_cfa.dot- A CFG in the intermediate language, instruction by instruction
filename_asmcfg.dot- A CFG of assembly instructions, in basic blocks
The analyses (CPAs) that should be working correctly are:
The following publications, sorted chronologically, describe specific aspects of Jakstab, or applications and extensions of it.
The CAV 2008 tool paper describes an early implementation of Jakstab, which was based on iterative constant propagation and branch resolution:
Our VMCAI 2009 paper introduces a generic framework for disassembly and control flow reconstruction guided by data flow analysis and defines the theoretical background for Jakstab. The framework is not fixed in its choice of domain, but allows to combine control flow reconstruction with any data flow analysis that provides abstract evaluation of expressions:
In FMCAD 2010, we give an overview on the Jakstab architecture and describe Bounded Address Tracking, a practical abstract domain used for control flow reconstruction and verification of API usage specifications on device driver binaries:
In our paper at VMCAI 2012, we give a reformulation of control flow reconstruction using parameterized semantics, and show how it can be extended to accomodate under-approximations derived from concrete execution traces. A prototype implementation shows that under-approximations allow to reconstruct useful CFGs when the over-approximation would have to conservatively over-approximate indirect jump targets.
The WCRE 2012 paper proposes a method for using Jakstab to analyze binaries that have been protected using virtualization-obfuscation.