Kafka Chronicles: Navigating Common Challenges & Pitfalls in Event Schema Evolution

Pratik Shah
Pratik Shah

May 25, 2024 | 5 minutes read

In this article today, we’re exploring Kafka, an open-source distributed event streaming platform, mainly used for high-performance data pipelines, streaming analytics and data integration. At its core, Kafka operates on a publish-subscribe model, where the producer sends messages to Kafka topics, and the consumer subscribes to these topics to retrieve and process the messages or events. A key part of this process is serializing messages, turning them into a stream of bytes before sending them. When consumed, consumers deserialize the bytes back into the desired data type for processing.
In this article, we delve into common challenges and pitfalls in maintaining and updating messages/events intended to be published on Kafka topics, using a practical scenario as an example.

Scenario

Imagine a scenario where we have these microservices:

When a user places an order, the Order service publishes an ORDER_CREATED event on the order topic. Both the Analytics and Audit services consume this event and then carry out the further processing.

(i): Published ORDER_CREATED event consumed by the Analytics and Audit service

(i): Published ORDER_CREATED event consumed by the Analytics and Audit service

Now, suppose the Dev team needs to change the data type of the date present in orderDetails to include the time along with the date when the order was placed. Here’s the process they’ll follow:

(ii): Change the data type of date to ZonedDateTime

(ii): Change the data type of date to ZonedDateTime

What do you think of this scenario and the process for updating the event’s schema? Sounds straightforward, right? But as with any system, there are potential pitfalls and challenges to watch out for.

Common Challenges and Pitfalls

1. Source of Truth 

Firstly, let’s consider the declaration of the event. As shown in the image (iii), each service maintains its own copy of the Order event. The order service serializes the Order event, defined by class com.abc.order.OrderEvent into the stream of bytes and publish it to the Order topic. The analytics and audit service consumes this stream of bytes and deserializes it to the class com.abc.analytics.OrderEvent and com.abc.audit.OrderEvent respectively.

(iii): Each service declaring its own copy of the Order event

(iii): Each service declaring its own copy of the Order event

Updates to the event structure or format require changes across all services, leading to duplication of effort and potential inconsistencies violating the DRY (Don’t Repeat Yourself) principle. Moreover, the lack of a single source of truth makes tracking the evolution of the Order event challenging, further complicating maintenance and updates.
2. Compatibility Concerns 

What happens if the new event schema (let’s call it V2) isn’t compatible with the original schema (V1), and vice versa?

Consider this, updating the data type of order date from LocalDate (V1) to ZonedDateTime (V2) is not backward compatible. This means the stream of bytes representing the V2 event cannot be deserialized back into the model representing the V1 event, and vice versa. Any V1 events still lingering in the Kafka topic, waiting to be consumed, become problematic. The updated Analytics and Audit services, expecting V2 events, won’t be able to consume these V1 events.

(iv): The V1 schema of the event is not compatible with the V2 schema of the event
(iv): The V1 schema of the event is not compatible with the V2 schema of the event
This mismatch will result in missed events, causing disruptions in data processing. Without proper error handling, these V1 messages essentially become poison pills, potentially terminating consumers prematurely and worsening the situation.
3. Synchronization of Releases 
Another hurdle arises if the Auditing team can’t release the change simultaneously. Picture this: the Dev and Analytics teams forge ahead with the release, but the Auditing team lags behind due to other priorities. In this scenario, the Order service will start publishing V2 events. Since this change from V1 to V2 is not backward compatible, the Audit service won’t be able to deserialize the V2 events because it is still expecting V1 events.
(v): Audit service not updated to consume the V2 events

(v): Audit service not updated to consume the V2 events

As a result, the Audit service misses out on these events, leading to data inconsistencies. This delay highlights the tight coupling between the change in event schema and consumer services, emphasizing the need for synchronized releases to prevent disruptions and maintain system integrity.

Conclusion

As we delve into Kafka’s role in helping microservices communicate, we encounter unexpected hurdles. What seemed like a simple task of updating the event schema unveiled hidden complexities. We learned about compatibility issues with schema evolution, challenges with the synchronizing of releases, and the need for a single source of truth in declaring event models. To tackle these common challenges and steer clear of pitfalls, we need to find answers to these questions:

In the upcoming part of the blog, we will explore various approaches to address these questions and overcome these obstacles.