Pegasus

Pegasus Workflow Management System

Pegasus WMS is a configurable system for mapping and executing scientific workflows over a wide range of computational infrastructures including laptops, campus clusters, supercomputers, grids, and commercial and academic clouds. Pegasus has been used to run workflows with up to 1 million tasks that process tens of terabytes of data at a time.

Pegasus WMS bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It automatically locates the necessary input data and computational resources required by a workflow, and plans out all of the required data transfer and job submission operations required to execute the workflow. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, Amazon EC2, etc.). In the process, Pegasus can plan and optimize the workflow to enable efficient, high-performance execution of large workflows on complex, distributed infrastructures.

Pegasus has a number of features that contribute to its usability and effectiveness:

Getting Started

You can find more information about Pegasus on the Pegasus Website.

Pegasus has an extensive User Guide that documents how to create, plan, and monitor workflows.

We recommend you start by completing the Pegasus Tutorial from Chapter 2 of the Pegasus User Guide.

The easiest way to install Pegasus is to use one of the binary packages available on the Pegasus downloads page. Consult Chapter 3 of the Pegasus User Guide for more information about installing Pegasus from binary packages.

There is documentation on the Pegasus website for the Python, Java and Perl DAX generator APIs.

There are several examples of how to construct workflows on the Pegasus website and in the Pegasus Git repository.

There are also examples of how to configure Pegasus for different execution environments in the Pegasus User Guide.

If you need help using Pegasus, please contact us. See the [contact page] (http://pegasus.isi.edu/contact) on the Pegasus website for more information.

Building from Source

Pegasus can be compiled on any recent Linux or Mac OS X system.

Source Dependencies

In order to build Pegasus from source, make sure you have the following packages installed:

Debian systems (Debian, Ubuntu, etc.)

Install the following packages using apt-get:

Red Hat systems (RHEL, CentOS, Scientific Linux, Fedora, etc.)

Install the following packages using yum:

In addition, RHEL 5 systems will require Python 2.6, which can be installed from EPEL. You will also need to install the right setuptools for Python 2.6, which can be installed from the Python Package Index using:

$ wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c9-py2.6.egg#md5=ca37b1ff16fa2ede6e19383e7b59245a
$ sudo /bin/sh setuptools-0.6c9-py2.6.egg

or, if you don't have root access:

$ /bin/sh setuptools-0.6c9-py2.6.egg -d ~/.local/lib/python2.6/site-packages

Mac OS X

Install Xcode and the Xcode command-line tools.

Install homebrew and the following homebrew packages:

SUSE (openSUSE, SLES)

Install the following packages:

Other packages may be required to run unit tests, and build MPI tools.

Compiling

Ant is used to compile Pegasus.

To get a list of build targets run:

$ ant -p

The targets that begin with "dist" are what you want to use.

To build a basic binary tarball (excluding documentation), run:

$ ant dist

To build the release tarball (including documentation), run:

$ ant dist-release

The resulting packages will be created in the dist subdirectory.