Deep Transfer Learning for Crop Yield Prediction with Remote Sensing Data

This project implements the deep learning architectures from You et al. 2017 and applies them to developing countries with significant agricultural productivity (Argentina, Brazil, India).

We also examine the efficacy of transfer learning of yield forecasting insights between adjoining countries; some results were published in the proceedings of COMPASS 2018. Our paper can be viewed here.

Contributers: Anna X Wang, Caelin Tran, Nikhil Desai, Professor David Lobell, Professor Stefano Ermon

Requirements

A Google Earth Engine account (for imagery retrieval)
A Google Cloud storage account (for image data storage and access)
A Google Cloud compute instance with python2 and GDAL

Instructions

For any of these scripts, python <script>.py -h will provide a CLI usage string with explanations of each parameter.

Instructions for creating a dataset

Steps marked with (#) should be done if the country of interest is not the US, India, Brazil, Argentina, or Ethiopia. See commit 9f7f43 in this repository for an example of adding a new country (Ethiopia).

(#) Create a Google Earth Engine table from a shapefile of your new country's level2 boundaries.
(#) Add your country to pull_modis.py configuration - you will need the identifier of the shapefile table in GEE, and also need to add instructions on how to extract relevant metadata (e.g. a human-readable name) from a feature in the shapefile. More detail are in the comments in pull_modis.py.
Run pull_modis.py with country and imagery type to download imagery to a Google Cloud bucket
Put satellite imagery into "sat" folder, temperature images into "temp" folder, cover images into "cover" folder
Run histograms.py with the sat,temp,cover folders specified arguments - outputs to a "histograms" folder
(#) Save a CSV containing yields for relevant set of regions, harvest years, and crop types. (TODO: more explanation of the yields CSV format)
Run make_datasets.py with the "histograms" folder and yields CSV, along with relevant parameters for use in ML (train/test split, years to ignore or to use, etc) - creates a "dataset" folder containing numpy arrays which will be used by training/testing architecture

Instructions for training the model

Run train_NN.py with a dataset name and neural net architecture type (CNN or LSTM). Generates a "log" folder containing the model weights, model predictions, and logs tracking model error.
Look inside the log folder for model results.

Instructions for fine-tuning a trained model on a different dataset

Run train_NN.py on a dataset folder "X" as above. Generates a "log" folder.
Run test_NN.py and pass in as arguments the "log" folder from training, along with a new dataset folder "Y" on which to fine-tune the model.
Result is a new folder "log2" containing the new model weights, predictions, and error logs for performance on dataset "Y".