Cosmos DB as a reliable cross-region exactly-once message bus?
UPDATE - After I’d talked to people in Microsoft about this, and possibly as a result, they removed the no data loss guarantee from the docs, so back to Service Bus it was :)
I currently have a requirement for a highly available multiple region message bus with at-least-once delivery to Azure Functions.
The usual way of doing this would be to use Service Bus with active replication. In other words, you set up queues/topics in two regions and clients send two copies of each message, one to each region. The receivers then read messages from both regions and keep a log of messages already received to avoid duplicate processing.
This works, but it’s a bit of a pain, particularly when your receiver is scaled out and trying to de-duplicate the messages.
On the other hand, Cosmos DB has all this built in. If you’re using strong consistency, you get:
- No data loss guaranteed
- No extra work to do if a region goes out
- Pulling messages via transactional stored procedures means the receivers don’t need to de-duplicate
- As a bonus, exactly-once delivery rather just at-least-once
So for the cost of a bit of work setting up SPs and Azure Functions (and a slight hike in Azure costs!) we get - in theory - high performance geo-redundant exactly-once messaging with a simple pattern for both senders and receivers.
I’ve set up a proof-of-concept that seems to work - basic code on GitHub - but I’m not sure whether I’m onto a winner here or whether I’m missing something than means either it’ll hit problems I haven’t anticipated, or that there are easier ways to do this with Service Bus.
So I’m asking for feedback. Is this a good idea? Should I take it forward? Or is there a better way? Please let me know what you think - you can get me on Twitter, LinkedIn or good old email.