I always found it difficult to talk about what reliability is and how to achieve that, sometimes I have to admit that I was not aware of certain problems or I wasn’t sure how to address them.

That is why I thought it was a good idea to collect the solutions offered by MuleSoft to the various reliability problems in one place, giving a unique view and different options.

In this article, I do not intend to give low-level technical details of each solution, it’s better to link the related Mulesoft documentation for that, however, I am happy to expand on any of these if required.

Overview

Reliability is defined in Wikipedia as “the quality of being trustworthy or of performing consistently well”.

Reliability is one of the most important requirements (non-functional) in IT, however, often it is left out of any requirement discussion as either thought:

not important for the customer (before everything goes to hell)
not need to be discussed, and why things should go wrong
too much technical, and so difficult for business people to understand
sometimes the Integration Architect might not emphasize the importance of understanding it

Unfortunately, if these requirements are not agreed upon when issues come (and they always will), it will be difficult to identify them and it will not be easy for the Client team to convince the business to invest money in fixing that without a clear business value.

Reliability Approaches

Requirements

Reliability means no message data loss during the processing, in case of errors or after a stop or a crash of the server(s) processing the requests.

Various reliability patterns can be implemented to achieve reliability goals for synchronous and asynchronous flows.

Way to achieve a Reliable API

Reliability in Mule applications can be achieved using:

Reconnection strategies, see here

When a system such as a DB of SFTP where a connection pool is used MuleSoft opens these connections at the start of the server and uses them while needed
If for any reason (remote systems down or connectivity) this pool is not properly populated or the connections go down at any time, by default, MuleSoft keeps running and eventually, all the flows using these connections will keep failing
By using this feature, it is possible to instruct Mule to reconnect and repopulate the pool at defined intervals.
I always configure the Reconnection strategy for instance with (S)FTP or DB connector

Until Successful scope, see here

It can execute a sequence of Mule processors, a defined number of times until all everything succeeds.
It is very useful for HTTP Requests where the connection is unstable.

Redelivery policy, see here

Redelivery policy in Mule 4 is similar to the until-successful scope but it gets applied always to the Source of the Flow.
For the developer it is just a configuration, however, It works by saving in the background by Mule, the received message in a Mule default cache and incrementing the number of times it gets resubmitted after an error occurs
When using this policy, it is a good practice to implement an error handler for the exception REDELIVERY_EXHAUSTED
Mule 4 Redelivery policy can be applied to any flow source, but a better practice is based on using external systems redelivery when supported (suck us for JMSConsume operations).

RETRY_EXHAUSTED see here

This exception handler as per best practice should always be implemented
It can be thrown in Mule4 common by any connectors

Transactions (Try Scope), see here

If a series of steps in a Mule flow must succeed or fail as one unit, in Mule, a good practice is to use a transaction to demarcate that unit.
Transaction can start at the Source of the flow or can be demarcated via a Try Scope.
If only a single system is required to be involved in a Transaction, MuleSoft local transaction will be sufficient to achieve the goal
In case more than one system has to be involved it is recommended XA transaction that will support 2-phase commits, making sure all systems involved will commit or all will rollback
The default approach in Mule4 is that in the happy path, the (XA or Single) transaction gets COMMITTED at the end, while in case of errors, within:
On-Error-Propagate handler the transaction gets ROLLED-BACK
On-Error-Continue handler the transaction gets COMMITTED

Message persistence for application downtime or crash might be required when the state has to be recovered

Persistence can be implemented in Mule via VM, JMS, DB, Cache/Object Store, File
When using JMS/ActiveMQ (other than Transaction based approaches) the ACK message approach can be used

Reliability Pattern in async scenario see here

I find it useful when having a push mechanism scenario that is triggered by changes to a source system (often used for systems synchronisation). Sometimes ago I would have referred to this as ChangeDataCapture (CDC), while today these use cases are usually called webhooks.
The main options for implementing the communication between the reliable acquisition and processing flows are MuleSoft VM or JMS/Active MQ, however, differences have to be noted:
Non-Persistent VM This is the only VM option available in Cloud Hub 2, they are in memory only Queue, therefore cannot really be Reliable since if the Mule Server will go down, the messages will be lost
Persistent VM, CH10 only, (based on Amazon SQS standard Queue, see here and here):
“at least one delivery guaranteed”, there is a chance the same message could be processed more than 1 time, therefore the flow has to call idempotent operation or have an idempotent filter
“message ordering” is not supported”
Can be used for this when only between MuleSoft API
It is recommended to configure a Redelivery policy (see here https://docs.mulesoft.com/connectors/vm/vm-reference#listener) and implement the Exception Handler for REDELIVERY_EXHAUSTED not to lose track of the messages LOST
JMS/Active MQ
“exactly one delivery” and “message ordering” is supported
It is recommended to configure a FIFO Queue when message ordering is needed, with a Dead Letter Queue (see here) and implement some monitoring on the Dead Letter Queue so as not to lose track of the messages LOST

Share this post

Mulesoft

Fabio Persico Mulesoft Practice Lead

Achieving Reliability in MuleSoft 4

Overview

Reliability Approaches

Requirements

Way to achieve a Reliable API