Overview

[CODE] [DOCUMENTATION] [PAPER] [BLOG POST] [GOOGLE GROUP] [COLAB]

DeepArchitect: Architecture search so easy you'll think it's magic!

Check colab to play around with it and run examples.

Introduction

DeepArchitect is a framework for automatically searching over computational graphs in arbitrary domains, designed with a focus on modularity, ease of use, reusability, and extensibility. DeepArchitect has the following main components:

For researchers, DeepArchitect aims to make architecture search research more reusable and reproducible by providing them with a modular framework that they can use to implement new search algorithms and new search spaces while reusing code. For practitioners, DeepArchitect aims to augment their workflow by providing them with a tool to easily write search spaces encoding a large number of design choices and use search algorithms to automatically find good architectures.

Installation

We recommend playing with the code on Colab first.

For a local installation, run the following code snippet:

git clone git@github.com:negrinho/deep_architect.git deep_architect
cd deep_architect
conda create --name deep_architect python=3.6
conda activate deep_architect
pip install -e .

Run one of the examples to check for correctness, e.g., python examples/framework_starters/main_keras.py or python examples/mnist_with_logging/main.py --config_filepath examples/mnist_with_logging/configs/debug.json.

We have included utils.sh with useful development functionality, e.g., to build documentation, extract code snippets from documentation, and build Singularity containers.

A minimal DeepArchitect example with Keras

We adapt this Keras example by defining a search space of models and sampling a random model from it. The original example has a single fixed three-layer neural network with ReLU activations in the hidden layers and dropout with rate equal to 0.2. We construct a search space by relaxing the number of layers that the network can have, choosing between sigmoid and ReLU activations, and the number of units for each dense layer. Check this search space below:

import keras
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Dense, Dropout, Input
from keras.optimizers import RMSprop

import deep_architect.helpers.keras_support as hke
import deep_architect.modules as mo
import deep_architect.hyperparameters as hp
import deep_architect.core as co
import deep_architect.visualization as vi
from deep_architect.searchers.common import random_specify

batch_size = 128
num_classes = 10
epochs = 20

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# model = Sequential()
# model.add(Dense(512, activation='relu', input_shape=(784,)))
# model.add(Dropout(0.2))
# model.add(Dense(512, activation='relu'))
# model.add(Dropout(0.2))
# model.add(Dense(num_classes, activation='softmax'))

D = hp.Discrete

def dense(h_units, h_activation):
    return hke.siso_keras_module_from_keras_layer_fn(Dense, {
        'units': h_units,
        'activation': h_activation
    })

def dropout(h_rate):
    return hke.siso_keras_module_from_keras_layer_fn(Dropout, {'rate': h_rate})

def cell(h_units, h_activation, h_rate, h_opt_drop):
    return mo.siso_sequential([
        dense(h_units, h_activation),
        mo.siso_optional(lambda: dropout(h_rate), h_opt_drop)
    ])

def model_search_space():
    h_activation = D(['relu', 'sigmoid'])
    h_rate = D([0.0, 0.25, 0.5])
    h_num_repeats = D([1, 2, 4])
    return mo.siso_sequential([
        mo.siso_repeat(
            lambda: cell(
                D([256, 512, 1024]), h_activation, D([0.2, 0.5, 0.7]), D([0, 1])
            ), h_num_repeats),
        dense(D([num_classes]), D(['softmax']))
    ])

(inputs, outputs) = mo.SearchSpaceFactory(model_search_space).get_search_space()
random_specify(outputs)
inputs_val = Input((784,))
co.forward({inputs["in"]: inputs_val})
outputs_val = outputs["out"].val
vi.draw_graph(outputs, draw_module_hyperparameter_info=False)
model = Model(inputs=inputs_val, outputs=outputs_val)
model.summary()

model.compile(
    loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])

This example shows how to introduce minimal architecture search capabilities given an existing Keras example. Our search space encodes that our network will be composed of a sequence of 1, 2, or 4 cells, followed by a final dense module that outputs probabilities over classes. Each cell is a sub-search space (underlining the modularity and composability of DeepArchitect). The choice of the type of activation for the dense layer in the cell search space is shared among all cell search spaces used. All other hyperparameters of the cell search space are chosen independently for each occurrence of the cell search space in the sequence.

The original single Keras model is commented out in the code above to emphasize how little code is needed to support a nontrivial search space. We encourage the reader to think about supporting the same search space using existing hyperparameter optimization tools or in an ad-hoc manner (e.g. how much code would be necessary to encode the search space and sample a random architecture from it).

The tutorials and examples cover additional aspects of DeepArchitect not shown in the code above. This is a slightly more complex example using searchers and logging. These are minimal architecture search examples in DeepArchitect across deep learning frameworks. They should be straightforward to adapt for your use cases.

Framework components

The main concepts in DeepArchitect are:

Main folder structure

The most important source files live in the deep_architect folder. The tutorials cover much of the information needed to extend the framework. See below for a high-level tour of the repo.

There are also a few folders in the deep_architect folder.

Roadmap for the future

The community will have a fundamental role in extending DeepAchitect. For example, authors of existing architecture search algorithms can reimplement them in DeepArchitect, allowing the community to use them widely. This sole fact will allow progress on architecture search to be measured more reliably. New search spaces for new tasks can be implemented, allowing users to use them (either directly or in the construction of new search spaces) in their experiments. New evaluators and visualizations can be implemented.

Willing contributors should reach out and check the contributing guide. We expect to continue extending and maintaining the DeepArchitect and use it for our research.

Reaching out

You can reach me at negrinho@cs.cmu.edu or at \@rmpnegrinho. If you tweet about DeepArchitect, please use the tag #DeepArchitect and/or mention me (\@rmpnegrinho) in the tweet. For bug reports, questions, and suggestions, use Github issues. Use the Google group for more casual usage questions.

License

DeepArchitect is licensed under the MIT license as found here. Contributors agree to license their contributions under the MIT license.

Contributors and acknowledgments

The lead researcher for DeepArchitect is Renato Negrinho. Daniel Ferreira played an important initial role in designing APIs through discussions and contributions. This work benefited immensely from the involvement and contributions of talented CMU undergraduate students (Darshan Patil, Max Le, Kirielle Singajarah, Zejie Ai, Yiming Zhao, Emilio Arroyo-Fang). This work benefited greatly from discussions with faculty (Geoff Gordon, Matt Gormley, Graham Neubig, Carolyn Rose, Ruslan Salakhutdinov, Eric Xing, and Xue Liu), and fellow PhD students (Zhiting Hu, Willie Neiswanger, Christoph Dann, and Matt Barnes). This work was partially done while Renato Negrinho was a research scientist at Petuum. This work was partially supported by NSF grant IIS 1822831. We thank a generous GCP grant for both CPU and TPU compute.

References

If you use this work, please cite:

@article{negrinho2017deeparchitect,
  title={Deeparchitect: Automatically designing and training deep architectures},
  author={Negrinho, Renato and Gordon, Geoff},
  journal={arXiv preprint arXiv:1704.08792},
  year={2017}
}

@article{negrinho2019towards,
  title={Towards modular and programmable architecture search},
  author={Negrinho, Renato and Patil, Darshan and Le, Nghia and Ferreira, Daniel and Gormley, Matthew and Gordon, Geoffrey},
  journal={Neural Information Processing Systems},
  year={2019}
}

The code for negrinho2017deeparchitect can be found here. The ideas and implementation of negrinho2017deeparchitect evolved into the work of negrinho2019towards, found in this repo. See the paper, documentation, and blog post. The code for the experiments reported in negrinho2019towards can be found here, but it will not be actively maintained. For your work, please build on top of the deep_architect repo instead.