Roll out your own BPM with some help from Cassandra (Part 1)


BPM stands for Business Process Management. In simple terms it’s a framework for managing long running business transaction. A long running business transaction manages a business process e.g., order processing in a eCommerce web site. A business process is generally asynchronous in nature and modeled by a state machine. State transition is triggered  by external events. A process transitions from one state to another in response to external events. For an order processing business process, one such event might be the response to a inventory reservation request received from a warehouse. The event could arrive as a message from the warehouse through web services or FTP transferred  file

BPM is a business logic processing  framework and follows the so called Hollywood principle i.e., “Don’t call me, I will call you”. With BPM, business logic gets executed as a callback during state transition.

It is very similar in nature to other frameworks e.g., web UI frameworks like Struts or JSF.  Custom business logic is encapsulated in the callback handler. With a BPM based design, the states, events and state transition related logic is brought to the forefront and the application specific business logic is pushed back to be executed as callbacks.

Another advantage of BPM based design, is that it enables the business analysts to define the high level process logic, sometimes graphically if such tools are available.

Few years  ago while  working on a eCommerce site , I wanted to  streamline the ad hoc  order processing logic with a BPM framework. License costs for commercial products were too high and I started thinking about implementing my own. After some brainstorming, I realized it was only going to take about half a dozen data base tables and several hundred lines of java code to implement a basic  framework without all the bells and whistles. I never understood why the commercial products were so expensive. Was it the fancy graphical process flow builder, which I was not considering in my implementation.

Data Model

Let’s consider the data model first. Here is the  list of entities  and their attributes


  ProcessSchema
    id : int
    name : string
    description : string

  ProcessState
    id : int
    name : string
    description : string
    type : int

 ProcessEvent
    id : int
    name : string
    description : string

  StateTransition
    eventId : int
    fromStateId : int
    toStateId: int
    invocationClass : string
    invocationMethod : string

  Process
    id : int
    processSchemaId : int
    stateId : int
    status : int

  ProcessCorrelation
    correlatedItem : string
    correlatedItemId : int
    processId : int

  ProcessContext
    processId : int
    paramName : string
    paramvalue : string


For brevity, I am considering a minimal data model definition.

The entities ProcessSchema, ProcessStateProcessEvent and StateTransition constitute the definition of the process schema.

I am assuming that the implementation is in java or some other object oriented language. In the StateTransition table, invocationClass and invocationMethod define the callback for a particular state transition. If there was graphical process flow builder tool, such tool would have populated these tables related to process schema.

The entity Process, ProcessCorrelation and ProcessContext are used by the BPM run time. The entity Process represents all active processes.

There is one record for each active process. In contains the current state, status.  Status indicates whether the process is currently active. It also has a reference to the process schema that this is an instance of.

The ProcessCorrelation needs bit of explaining.  Whenever a new process is created, The BPM  generates an unique Id for the process. However, the internal Id is of not much of use to the external services that the BPM integrates with. Here is an example to drive home the point I am trying to make. Our order processing BPM sends an inventory request with orderId, processId and other pertinent information. The inventory service responds with a requestId. The requestId is like a cookie that provides the context for any further communication with the inventory service. Our BPM needs to save the mapping between the requestId and the processId in  ProcessCorrelation, so that later when we receive a message from the inventory system with requestId in it, we can use it to look up the corresponding processId.

There could be multiple such external Id from the different external services that the BPM interacts with. For example, when the order is shipped, the shipping service might generate a shipmentTrackingId, which again needs to be mapped with processId and stored in ProcessCorreleation.  We might call the shipment  service later to check whether the shipment has been delivered and while doing so we will need to provide the  shipmentTrackingId. We may have daemon process that scans all the processes that are in shipped state and depending on how much time has elapsed since shipment  and the kind of shipment (e.g., overnight, two day etc.) we query the shipment service for delivery status. For each such process, we look up the shipmentTrackingId from the processId. Then we use the shipmentTrackingId to query the inventory service.

ProcessContext contains all the contextual data as name value pairs. They get consumed by the business logic that executes during state transitions. The business logic may also generate additional context data or modify existing context data.  After the state transition the new set of context data for the process needs to be saved in ProcessContext. The signature for the callback handler that executes business logic during state transition is as follows. The context does not necessarily contain data all the time. It could contain refrence to other data, e.g. primary key of a record in some table.


  class ProceesHandler {
    ........

    public Map<String,String> handle(Map<String,String> context) {
      //business logic here

      //return new process context
    }
    .......

  }

This method does whatever is necessary to execute the business logic e.g., do database transaction etc. Let’s consider the BPM driver next

Process Manager

The BPM driver responds to new events. An event could be a message arriving from an external system. For example, an inventory reservation response received from a warehouse.

The response could be arriving as a message in a web service or an FTP transferred file. An event could also be generated from the UI, e.g, a buyer doing a checkout in an eCommerce site. In this case the BPM driver will create a new instance of a process and put the process in a start state.

At all such touch points with external systems, BPM client code runs and  generates BPM events, there will be  BPM client code that will scan the all the data related to the event and identity the correlationItem and correlationItemId. The  correlationItemId in case of the inventory response message, will be the inventory requestId , which is likely to be  embedded somewhere in the message. It will also glean the message  and collect all the related  data in a map to create the event context. Then it will call the ProcessManager method processEvent() which works as follows


class ProcessManager {
  .......
  public void processEvent(String correlationItemType, String correlationItemId,
    String event, Map<String, String> eventContext){
    //find processId from correlationItemType and correlationItemId
    int processId = ...

    //find current state from Process table for this process
    int state = ...

    //find next state invocationClass and invocationMethod from StateTransition
    //based on current state and event
    int nextState = ...
    String invocationClass = ...
    String invocationMethod = ...

    //get map of current context from ProcessContext table for this process

    //merge current context with event context to create a new context map
    Map<String, String> context = ...

    //create business logic callback handler and invoke it
    Map<String, String> newContext = handler.handle(context)

    //set state in Process with nextState for this process
    ...

    //save in newContext in ProcessContext for this process
    ....

  }
}

The process manager simply deals with states and event and is completely decoupled from the business logic associated with the process state transition. It is important that that the process manager and the callback handler executing application business logic are scoped within  the same transaction.

One way to do that will be to use container (e.g., JEE or Spring) managed transaction and annotate both methods with “transaction required” attribute.

Concluding Thoughts

I have considered a simple model for process flow. In more complex scenario, you could have parallel flows and synchronized join.  For example, in order processing you could have inventory reservation and payment processing in parallel flows and having a synchronized join which will trigger only when both flows are completed.

Parallel flows could be handled by having multiple tokens associated with a process. Each token flows through a different branch when there are multiple parallel flows. We could introduce  entity called ProcessToken which will have many to one association with Process. A process token can be thought of as something that flows through the process flow graph and moves from one state to another as state transition happens.

In the next post, I will design a data model for our BPM using NOSQL database Cassandra. We want a highly available storage system for our BPM based order processing system and Cassandra is a good choice. When a customer places an order, we never want to be in the situation of not being able to take the order, just because our database system happens to be down. Cassandra can help us here.

In an RDBMS, the entities listed earlier correspond to tables in the RDBMS. It’s not necessarily so in Cassandra. The schema in Cassandra or other NOSQL system is not so much driven by the structural similarity of the objects. It’s driven more by the data usage pattern in the application. Objects that are accessed together tend to be in the same column family (think table in RDBMS) in Cassandra, even if they are not structurally identical.

Advertisements

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in BPM, Cassandra, eCommerce, NOSQL. Bookmark the permalink.

One Response to Roll out your own BPM with some help from Cassandra (Part 1)

  1. Pingback: Roll out your own BPM with some help from Cassandra (Part 2) | Mawazo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s