One true story about a crisis with Aggregate Root Identity and Identity creation strategies we considered.
Within our Bounded Context (in our case wrapped into a single microservice) during the domain modelling we had identified Payment as Aggregate Root. For clarity of the story about our modelling challenges and decisions, I will simplify the domain a bit. So, let’s say a Payment had pretty much standard bank transaction properties and it was supposed to be created by the user through the user interface.
When deciding on Payment Identity our main goal was to try to model it to be meaningful and human-readable. First, we were thinking about the properties user will search by. But, none of the properties seemed like a good candidate because of an obvious lack of uniqueness. Even a combination of some of the properties did not seem to be possible to use to construct a unique and stable Identity.
User provides Identity?
Furthermore, we considered giving the possibility to a user to define the unique Identity of a Payment but it did not seem like a way to go. This approach would require additional information users need to come up with when creating the payment that probably would not have domain importance, additional user involvement to keep track of used Identities to ensure uniqueness, and of course additional validations and mechanisms to be implemented to help user operate with the application. We quickly moved on.
Application or Database generates Identity?
At this point we realized that having a meaningful and human-readable Identity would not be possible, so we had two more options left. Identity generated by Application or Database. We easily decided to go with Application generated Identity (GUID) because of several advantages compared with the Database option.
Database generated Identity would require additional implementation details. Also, we would surely have better performance with Application generating identities not having to wait for data to be stored into the Database to determine Identity. Besides the performance, the moment when the Identity was generated was very important to us. We used event sourcing and we needed Identities set before storing the domain events into the event store.
It is important to say that our event store had optimistic concurrency controlenabled. Saving two domain events with the same Version related to the same Payment was not possible. One of the domain events would be discarded and the user would be informed.
Then new requirements came up. There were several other microservices (different bounded contexts) that recognized a need to create payments through our application. Our application would have to handle and process commands received via service bus and create payments accordingly. During the lifetime of a Payment, our application would have to publish events via service bus so other microservices would be able to subscribe and listen to the events, each time Payment status changes.
In order to meet the requirements, the only domain change we introduced was a new Payment property called Correlation. It will be generated by other microservices and sent to our application when initiating a payment. It was agreed that the payment status change events will contain Correlation property as well, so subscribers can determine payments they are interested in and be able to update the status of a correct payment in their contexts.
And it worked, at least until one morning, when we noticed a bunch of payments with the same Correlation on their way to the bank. Actually, some Payments were completely the same, the same Correlation, Amount, Bank Accounts… The only difference was the Payment Identity that was unique across all payments in our application. That did not seem right. The payments looked like they were doubled or even tripled somehow. Those were around €100,000.00 worth of unwanted payments with went-to-bank status. Soon after, there were a lot of headphone icons next to our names on Slack.
Post mortem investigation discovered a bug in a client application, the one that initiates the payments in our application via service bus. The bug caused sending the same message over and over again. The bug was not on our side, but it does not mean we did everything right. To make things even worse, we realized that we had the service bus configured as at-least-one-delivery which could lead to duplicated messages even without bugs in client applications. As it turned out, we did not handle properly the possibility of receiving the same message more than once and poorly chosen Payment Identity played a major role.
Another Bounded Context assigns Identity?
Upon receipt of a command, the message handler was checking and rejecting the message if a Payment with the same Correlation already exists in our read models. But we did not realize that it would not be enough. Our read models were eventually consistent with the domain events. In case we receive several messages at the same time with the same Correlation, the does-already-exist check might not help, because the read model might not be updated yet.
Then it was clear, Correlation should have been Payment Aggregate Root Identity, the Identity generated by another Bounded Context. That way no special validations are required. Our event store will not allow saving PaymentInitiated domain events with the already existing combination of Aggregate Root Identity and Aggregate Root Version.
We missed the opportunity to rethink and redesign Payment Aggregate Root Identity at the moment when we introduced the new way for payments to be created. Actually, we should try to question, rethink, remodel and adjust the design every time the domain expands. We realized that doing DDD properly requires continuous brainstorming in every phase of the project. We do not need to be afraid of modelling wrong. Only through challenges and experience (ours and others), we can improve our modelling skills. There is no boilerplate solution that fits every domain.
And yes, the bank was contacted and double payments were cancelled. Everything was fine.