Download Large Open Source Java Code Corpus

If you want to dump github repositories or download a large number of open source Java projects, here is the script.

There are 2 files required to dump the corpus.

download-java-open-source-projects-corpus

The first is “repo-list.txt”. It contains the 39020 Java project names. The second file is “script-zip.sh”. It is the shell script that run git to download the Java projects.

Leave a Comment