predictatops

Code for stratigraphic pick prediction via supervised machine-learning

Yale-Peabody-Triceratops-004Trp

DOI

License: MIT

Status: Runs and ready for others to try, but not yet stable. Updated to v0.0.4-alpha October 26th, 2019. NOTE: Running in a standard google colab notebook will fail during model training due to excessive memory requirement.

Current best RMSE on Top McMurray surface is 6.6 meters.

Related Content

The docs provide information additional to this README.

This code is the subject of an abstract submitted to the AAPG ACE convention in 2019.

The slides I presented at AAPG ACE 2019 are available in PDF form. They give an introduction to the theory and thought process behind Predictatops.

Development was in this repo: MannvilleGroup_Strat_Hackathon but is now moving here as the code gets cleaned and modulized. This project is under active development. A few portions of the code still only exist on MannvilleGroup_Strat_Hackathon repo at this time. This is a nights and weekend side project, but will continue to be developed by the main developer.

A more non-coder friendly description of the work can be found in this blog post.

Philosophy

In human-generated stratigraphic correlations there is often talk of lithostratigraphy vs. chronostratigraphy. We propose there is a weak analogy between lithostratigraphy and chronostratigraphy and the different methods of computer assisted stratigraphy. Some of the past efforts, which work very well under certain circumstances, are similar to lithostratigraphy in terms of what they accomplish. They match curve patterns between neighboring wells and rely on the assumption that changes in lithology ~ curve shapes are equivelant to stratigraphy.

Other papers attempt to use code to correlate well logs assuming there was a mathematical or pattern basis for stratigraphic surfaces that can be teased out of individual logs. Although there are recent papers that seem to do better with this type of approach, no code was released, the earlier ones seem to have problems that at least in part were related to their assumption that stratigraphic changes had similar expression across large spatial areas.

In contrast to lithostratigraphy, chronostratigraphy assumes lithology equates to facies belts that can fluctuate gradually in space over time, and are not correlated with time. Two wells with similar lithology patterns can be in different time packages. Traditional chronostratigraphy relies on models of how facies belts should change in space when not otherwise constrained by biostratigraphy, chemostratigraphy, or radiometric dating.

Instead of relying on stratigraphic models, this project proposes known picks can define spatial distribution of, and variance of, well log curve patterns that are then used to predict picks in new wells. This project attempts to focus on creating programatic features and operations that mimic the low level observations of a human geologist and progressively build into higher order clustering of patterns occuring across many wells that would have been done by a human geologist.

Datasets

The default demo dataset used is a collection of over 2000 wells made public by the Alberta Geological Survey's Alberta Energy Regulator. To quote their webpage, "In 1986, Alberta Geological Survey began a project to map the McMurray Formation and the overlying Wabiskaw Member of the Clearwater Formation in the Athabasca Oil Sands Area. The data that accompany this report are one of the most significant products of the project and will hopefully facilitate future development of the oil sands." It includes well log curves as LAS files and tops in txt files and xls files. There is a word doc and a text file that describes the files and associated metadata.

Wynne, D.A., Attalla, M., Berezniuk, T., Brulotte, M., Cotterill, D.K., Strobl, R. and Wightman, D. (1995): Athabasca Oil Sands data McMurray/Wabiskaw oil sands deposit - electronic data; Alberta Research Council, ARC/AGS Special Report 6.

Please go to the links below for more information and the dataset:

Report for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/document/OFR/OFR_1994_14.PDF

Electronic data for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/publications/SPE_006.html Data is also in the repo folder: SPE_006_originalData of the original repo for this project here.

In the metadata file SPE_006.txt the dataset is described as Access Constraints: Public and Use Constraints: Credit to originator/source required. Commercial reproduction not allowed.

The Latitude and longitude of the wells is not in the original dataset. @dalide used the Alberta Geological Society's UWI conversion tool to find lat/longs for each of the well UWIs. A CSV with the coordinates of each well's location can be found here. These were then used to find each well's nearest neighbors.

Please note that there are a few misformed .LAS files in the full dataset, so the code in this repository skips those.

If for some reason the well data is not found at the links above, you should be able to find it here.

Architecture and Abstraction

PLEASE REFER TO THE SECTION Architecture and Abstraction in the DOCs. Information is provided on code architecture, tasks, and folder organization.

GettingStarted

See the Usage and the Installation sections of the docs.

Credits

There's a theme here. Check the docs.


Status

The root mean squared error for the Top McMurray surface is down to ~7 meters (with a handful of wells identified as too difficult to predict, -8% depending on settings).

Distribution of Absolute Error in Test Portion of Dataset for Top McMurray Surface in Meters.

Y-axis is number of picks in each bin, and X-axis is distance predicted pick is off from human-generated pick. <img src="docs/images/Histogram_Error_predictatops_6.6_vA.png" alt="image of current_errors_TopMcMr_20190517" style="float: left; margin-right: 25px;" />

Current algorithm used is XGBoost.