Kafka Streams: Persisting RocksDB state on shared disk across multiple instances

I have a Kafka Streams application running as a StatefulSet in a Kubernetes cluster. Each instance uses 10 threads. Due to the volatile nature of the data load, I use a scaling mechanism to dynamically increase or decrease the number of replicas.
The application aggregates data in a state store (currently in-memory only, without changelog topics).

Problem

When I scale the instances, for example from 15 to 10, the in-memory state from the 5 removed instances is lost.

Approach 1

To avoid losing the state, I enabled logging for the state stores. However, this led to a significant increase in Kafka broker CPU usage. Additionally, it took too long to restore the state from the changelog topics.

Approach 2

I used persistent state stores (without logging), which use RocksDB under the hood. The restore process should be faster when reading data from disk instead of from the changelog topics. To achieve this, I created a single volume and mounted it into all instances to share the state across them. This was necessary because if each instance had its own volume, the data from the removed instances would be lost as well.
However, this resulted in the following error:

Unable to initialize state, this can happen if multiple instances of Kafka Streams are running in the same state directory.

After further research, I found that it seems impossible to share the state in a single state.dir.

Is there any other way to restore state across instances without using changelog topics?

asked 22 hours ago

Kewitschka

1,6612 gold badges23 silver badges36 bronze badges

Yes, there are other ways. But it requires writing a custom StateStoreSupplier
– OneCricketeer
Commented 16 hours ago

Add a comment |

Collectives™ on Stack Overflow

Kafka Streams: Persisting RocksDB state on shared disk across multiple instances

Problem

Approach 1

Approach 2

0

Browse other questions tagged
apache-kafka
apache-kafka-streams
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

Problem

Approach 1

Approach 2

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged apache-kafkaapache-kafka-streams or ask your own question.

Browse other questions tagged
apache-kafka
apache-kafka-streams
or ask your own question.