pipelines-api-examples

This repository contains examples for the [Google Genomics Pipelines API] (https://cloud.google.com/genomics/reference/rest/v1alpha2/pipelines).

Alpha
This is an Alpha release of Google Genomics API. This feature might be changed in backward-incompatible ways and is not recommended for production use. It is not subject to any SLA or deprecation policy.

The API provides an easy way to create, run, and monitor command-line tools on Google Compute Engine running in a Docker container. You can use it like you would a job scheduler.

The most common use case is to run an off-the-shelf tool or custom script that reads and writes files. You may want to run such a tool over files in Google Cloud Storage. You may want to run this independently over hundreds or thousands of files.

The typical flow for a pipeline is:

  1. Create a Compute Engine virtual machine
  2. Copy one or more files from Cloud Storage to a disk
  3. Run the tool on the file(s)
  4. Copy the output to Cloud Storage
  5. Destroy the Compute Engine virtual machine

You can submit batch operations from your laptop, and have them run in the cloud. You can do the packaging to Docker yourself, or use existing Docker images.

Prerequisites

  1. Clone or fork this repository.
  2. If you plan to create your own Docker images, then install docker: https://docs.docker.com/engine/installation/#installation
  3. Follow the Google Genomics getting started instructions to set up your Google Cloud Project. The Pipelines API requires that the following are enabled in your project:
    1. Genomics API
    2. Cloud Storage API
    3. Compute Engine API
  4. Follow the Google Genomics getting started instructions to install and authorize the Google Cloud SDK.
  5. Install or update the python client via pip install --upgrade google-api-python-client. For more detail see https://cloud.google.com/genomics/v1/libraries.

Examples

See Also