Introduction

This document gives an overview of the Telekom workflow engine. For more details, please read the wiki pages: https://github.com/zutnop/telekom-workflow-engine/wiki.

Workflow engine

The telekom-workflow-engine is a custom built embeddable technology that provides a runtime environment for long-lived business processes. The main features, and the meaning of life, of the engine are:

Why yet another workflow engine?

The available open-source offerings were mostly based on drag and drop visual design, lots of XML files and the BPMN standard, which is great if you are building demos with less than 10 elements in the workflow or writing Gartner recommendations, but it quickly becomes a huge PITA if you are actually implementing and running complex real-world working software on those platforms and integrating this with your existing services layer.

The previous 10 years we were using BEA/Oracle Weblogic Integration platform versions 7.x, 8.x, 9.x (end-of-life, no support, black box, mystical problems, running instances dissapearing etc., stuck with Java 5 and other ancient technologies, workflow development is awfully slow and complex) and the last 3 years we were trying to find a new platform for our workflows. After having a brief look at Activiti, Bonita Open Solution and JBoss jBPM, we decided to try the jBPM 5.4. After several proof-of-concepts on the jBPM and after adding/changing/enhancing/implementing/identifying all the missing parts (clustering, timers, WS-HumanTasks, java tasks, admin console, REST interface) it became clear that this was NOT the way forward. The last drop to the glass came when we actually tried to implement a few of our simpler workflows, the developer experience was nowhere near good enough to be able to commit to this for the next 10 years (it was buggy, slow, complex, managed to break the definition in a way that required to start from the scratch because there was no way to undo or fix it in the IDE, etc.)!

It was decided to build a lightweight embeddable workflow engine that meets our requirements and plays nicely together with our technology stack to achieve:

Technical vision

Engine overview

The engine implementation can be divided into three main parts:

An empty engine itself does not provide any value for the end users. To do something useful with the engine, you need to write your workflow definitions (via using the API). The engine implementation and the workflow definitions are to be kept in separate repositories to provide a clear distinction between the platform and the actual workflow business logic code. The engine, together with those workflow definitions, packaged as a *.war archive, will be deployed to Tomcat web server(s). When the web server is started, the engine spools up, reads the previous state from its DB and continues to execute the ongoing instances. The engine will keep executing the started workflow instances, persisting the new state when a wait state is reached.

Error handling

The execution of the long-lived workflows is a delicate process (must survive technical problems) and thus a lot of effort has gone into making the workflow engine very robust and bulletproof. The workflows are executed as transactions from one wait state to the next wait state. If the next wait state is successfully reached, then the new state is persisted and the transaction is commited. If an exception occurs, the transaction is rolled back, the execution is stored and the workflow will go into a frozen state, waiting for a human decision. After correcting the problem, the workflow can be continued from the last good wait state. This achieves:

Performance

The engine is built to provide high throughput and supports scalability in both dimensions:

Monitoring/management

The workflow engine publishes a web console, REST services and JMX interface for monitoring and interacting with the running engine. These services provide the following functionality:

Testing

The workflow engine is fully covered with unit and integration tests. And the workflow definitions can also be easily tested with JUnit tests for rapid development and debuging.

Workflow implementations

To implement an automatic workflow, you need to implement the workflow process/rules in Java using the telekom-workflow-api interfaces. Each workflow is described in its own Java class as one or more steps.

Workflow definition example:

factory
    .start()

    // validate input
    .validateInputVariable( 0, "customerId", String.class)

    // load data into environment
    .variable( "customerBalance" ).call( 1, "customerService", "getCustomerBalance", "${customerId}" )
    // calculate the suspend time after the warning message has been sent
    .variable( "suspendTime" ).call( 2, "customerService", "getSuspendTimeAfterWarning" )

    .whileDo( 3, "customerBalance < 0 && suspendTime.time > System.currentTimeMillis()" )
        .split( 4 )
            .branch()
                // wait until suspendTime
                .waitUntilDate( 6, "${suspendTime}" )
            .branch()
                // and at the same time monitor balance changes
                .waitSignal( 7, "PAYMENT" )
        .joinFirst()
        .variable( "customerBalance" ).call( 8, "customerService", "getCustomerBalance", "${customerId}" )
    .whileDo()

    // only continue with the proceedings if the customer balance is still negative
    .if_( 9, "customerBalance < 0" )
        // create a suspend order
        .variable("suspendOrderId").callAsync( 10, "customerService", "suspendCustomer", "${customerId}" )

        // wait until the order is beeing processed
        .doWhile()
            .waitTimer( 11, "1000" )
            .variable( "suspendOrderStatus" ).call( 12, "customerService", "getOrderStatus", "${suspendOrderId}" )
        .doWhile( 13, "suspendOrderStatus == 'PROCESSING'" )

        // if the order fails, create a manual task
        .if_( 14, "suspendOrderStatus != 'COMPLETED'" )
            .humanTask( 15, "ROLE_CUSTOMER_SUPPORT", null ).withAttribute( "customerId", "${customerId}" ).withAttribute( "taskType", "MANUAL_SUSPEND" ).done()
        .endIf()

        // find out the next step 
        .variable( "nextStep" ).call( 16, "exampleStepSelector", "findNextStep", "${customerId}", "02" )
        //  and start it, passing along the customerId attribute
        .createInstance( 17, "${nextStep}", null, "${customerId}", null ).withAttribute( "customerId", "${customerId}" ).done()
    .endIf()

    .end();