Automatic Speech Recognition (ASR) - DeepSpeech German

_This is the project for the paper German End-to-end Speech Recognition based on DeepSpeech published at KONVENS 2019._

This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Important Links:



This Readme is written for DeepSpeech v0.5.0. Refer to Mozillla DeepSpeech for lastest updates.


  1. Requirements
  2. Speech Corpus
  3. Language Model
  4. Training
  5. Hyper-Paramter Optimization
  6. Results
  7. Trained Models
  8. Acknowledgments
  9. References


Installing Python bindings

virtualenv -p python3 deepspeech-german
source deepspeech-german/bin/activate
pip3 install -r python_requirements.txt

Installing Linux dependencies

The necessary Linux dependencies can be found in linux_requirements.

xargs -a linux_requirements.txt sudo apt-get install

Mozilla DeepSpeech

$ wget 
$ tar -xzvf v0.5.0.tar.gz
$ mv DeepSpeech-0.5.0 DeepSpeech

Speech Corpus

1. Tuda-De

$ mkdir tuda
$ cd tuda
$ wget
$ tar -xzvf german-speechdata-package-v2.tar.gz

2. Mozilla

$ cd ..
$ mkdir mozilla
$ cd mozilla
$ wget

3. Voxforge

$ cd ..
$ mkdir voxforge
$ cd voxforge
from audiomate.corpus import io
dl = io.VoxforgeDownloader(lang='de')
$ cd ..
$ ##Tuda-De
$ git clone
$ deepspeech-german/pre-processing/ --tuda $tuda_corpus_path  $export_path_data_tuda

$ ##Voxforge
$ deepspeech-german/pre-processing/
$ python3 deepspeech-german/ --voxforge $voxforge_corpus_path $export_path_data_voxforge

$ ##Mozilla Common Voice
$ python3 DeepSpeech/bin/ --filter_alphabet deepspeech-german/data/alphabet.txt $export_path_data_mozilla

_NOTE: Change the path accordingly in

Language Model

We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield

$ git clone
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`

We used an open-source German Speech Corpus released by University of Hamburg.

  1. Download the data
$ wget
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz
  1. Pre-process the data
$ deepspeech-german/pre-processing/ $text_corpus_path $exp_path/clean_vocab.txt
  1. Build the Language Model
    $kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/ --o 3
    $kenlm/build/bin/build_binary -T -s $exp_path/ $exp_path/lm.binary

NOTE: use -S memoryusein%, if malloc expection occurs


$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/ --o 3 -S 50%


To build Trie for the above trained Language Model.


  1. Build Native Client.
# The DeepSpeech tools are used to create the trie
$ git clone
$ cd tensorflow
$ git checkout origin/r1.13
$ ./configure
$ ln -s ../DeepSpeech/native_client ./
$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden // //native_client:generate_trie --config=cuda


Flags used to configure TensorFlow

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with ROCm support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Do you want to use clang as CUDA compiler? [y/N]: N
Do you wish to build TensorFlow with MPI support? [y/N]: N
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

_Refer Mozilla's documentation for updates. We used Bazel Build label: 0.19.2 with DeepSpeechV0.5.0_

  1. Build Trie
    $ DeepSpeech/native_client/generate_trie $path/alphabet.txt $path/lm.binary $exp_path/trie


Define the path of the corpus and the hyperparameters in _deepspeech-german/ file.

$ nohup deepspeech-german/ &

Hyper-Paramter Optimization

Define the path of the corpus and the hyperparameters in _deepspeech-german/ file.

$ nohup deepspeech-german/ &


Some results from our findings.

NOTE: Refer our paper for more information.

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

The DeepSpeech model can be directly re-trained on new dataset. The required dependencies are available at:

1. v0.5.0

This model is trained on DeepSpeech v0.5.0 with _Mozilla_v3+Voxforge+Tuda-De_ (please refer the paper for more details)

2. v0.6.0

This model is trained on DeepSpeech v0.6.0 with _Mozilla_v4+Voxforge+Tuda-De+MAILABS(454+57+184+233h=928h)_

Why SHY to STAR the repository, if you use the resources? :D

Transfer Learning

1. German to German

$ nohup deepspeech-german/ & 

2. English to German

$ nohup deepspeech-german/ & 

NOTE: The checkpoints should be from the same version to perform Transfer Learning




If you use our findings/scripts in your academic work, please cite:

    author = "Aashish Agarwal and Torsten Zesch",
    title = "German End-to-end Speech Recognition based on DeepSpeech",
    booktitle = "Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers",
    year = "2019",
    address = "Erlangen, Germany",
    publisher = "German Society for Computational Linguistics \& Language Technology",
    pages = "111--119"