DUQ: A Machine Learning Approach for Weather Forecasting
Sequential deep uncertainty quantification (DUQ) produces more accurate weather forecasting based on the observation and NWP prediction. Our online rank-2 (CCIT007) in Global AI Challenger-Weather Forecasting (https://challenger.ai/competition/wf2018) indicates deep learning is very considerable for large-scale meteorological data modeling and forecasting!
Pragmatical loss function for sequence-to-sequence uncertainty quantification is proposed.
Important experimental phenomenon was reported and analysized experimentally, which may be noteworthy in the furture deep learning researches for spatio-temporal data and time series forecasting.
Apache
Paper: http://urban-computing.com/pdf/kdd19-BinWang.pdf
3 mins promotional videos: https://www.youtube.com/watch?v=3WPkXWZm89w&list=PLhzEeQSx1uAFVhR8m631pY5TNiP1hkZCn&index=68&t=0s
If you feel it helpful, please cite our paper:
@inproceedings{Wang:2019:DUQ:3292500.3330704,
author = {Wang, Bin and Lu, Jie and Yan, Zheng and Luo, Huaishao and Li, Tianrui and Zheng, Yu and Zhang, Guangquan},
title = {Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting},
booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \&\#38; Data Mining},
series = {KDD '19},
year = {2019},
isbn = {978-1-4503-6201-6},
location = {Anchorage, AK, USA},
pages = {2087--2095},
numpages = {9},
url = {http://doi.acm.org/10.1145/3292500.3330704},
doi = {10.1145/3292500.3330704},
acmid = {3330704},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {deep learning, uncertainty quantification, urban computing, weather forecasting},
}
I test it on MacOS and Ubuntu. It is based on Python 3.6. Required packages like keras, tensorflow etc. are iincluded in requirements.txt. Run bellow command to install them.
pip install -r requirements.txt
Pipeline for quick start.
I noticed that the competition link has been closed. However, competitors have uploaded the raw dataset.
Training set: ai_challenger_wf2018_trainingset_20150301-20180531.nc
Validation set: ai_challenger_wf2018_validation_20180601-20180828_20180905.nc
Test set (Taking one-day test data on 28/10/2018 as an example): ai_challenger_wf2018_testb1_20180829-20181028.nc
After downloaded, set three original dataset into the folder ./data/raw/ (For quick start, we just take ai_challenger_wf2018_testb1_20180829-20181028 as a test example, you can easily re-name the related arguments and apply it on other test set, which can be introduced in later section How to change test dataset for other days? )
Ensure that you are located in the project root directory and implement commands strictly according to bellow order:
make make_train_data (Prepare training data)
make make_val_data (Prepare validation data)
make make_TestOneDayOnlineData_for_submit (Prepare one-day test data)
make train_from_scratch (Train)
make load_single_model_and_predict (Test)
make evaluate_1028_demo
make load_multi_models_pred
python ensemble.py
make evaluate_1028_demo_ensemble
Everyday, we have three opportunities to submit. We will use 3 models
1. Ensemble model.
During the first competition days: we use models for ensemble including:
MODEL_LIST=seq2seq_subnet_50_swish_dropout \
seq2seq_subnet_30_30_best \
seq2seq_subnet_200_200 \
seq2seq_model_100 \
seq2seq_subnet_50_50_dropout\
seq2seq_model_250_250\
seq2seq_subnet_100_swish_dropout
During the last competition days: we try switching to models:
MODEL_LIST_NEW=seq2seq_model_best4937\
seq2seq_subnet_200_200 \
seq2seq_subnet_100_swish_dropout
2. Single model 1: Seq2Seq_MVE_layers_222_222_loss_mve_dropout0. (At ./models/model_for_official)
3. Single model 2: Seq2Seq_MVE_layers_50_50_loss_mae_dropout0 (At ./models/model_for_official)
Download raw test data oneline to the path: ./data/raw/
Edit file '/src/data/make_TestOnlineData_from_nc.py'. Set file_name to the same of the downloaded file in bellow snippet:
elif process_phase == 'OnlineEveryDay': file_name='ai_challenger_wf2018_testb1_20180829-20181028.nc'
Modify Makefile file for the rule 'make_TestOneDayOnlineData_for_submit':
Modify Makefile for rule 'load_multi_models_pred': (Take datetime=2018xxxx as instance)
Modify Makefile for rule 'load_single_model_and_predict'
Locate to './src/weather_forecasting2018_eval/pred_result_csv/':
Sanity check by visualization (not try).
You can dive into the './src/models/parameter_config_class.py'. Due to the parameters of the deep model are too many. Here we do not play exhaustedly. We mainly play different parameters of 'self.layers' and use ensemble to combine shallow and deep seq2seq model.
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── src <- Source code for use in this project.
Project based on the cookiecutter data science project template. #cookiecutterdatascience