We have Spring Boot Rest API deployed in Kubernetes cluster which integrates with MongoDB to fetch the data.
MongoDB is fed with data by a real time Spark & NiFi job.
Our clients complained that for a request what they send they don't have response within 90 seconds. Consider it like an OMS ( Order ManagEment System).
On further analysis, we found that Spark & NiFi processing is happenning within 10 seconds after consuming response data from Kafka. Thus, initally out thought was that it due to delay from upstream to produce data in to Kafka.
Thankfully, our data had create / request timestamp, and when response was received, and when response was inserted into MongoDB.
Subtracting response insert time from request time seemed to be well within 90 seconds. But, still client did timeout on not seeing a response within 90 seconds. This led to confusion on our side.
But, then we realized it was due to Read Preference. We updated this to secondaryPreferred which led to above stated behaviour, as we wished to have traffic balanced in our mongodb cluster.
We realized that it takes some time internally for MongoDB to actually repliacte data to all nodes in the cluster. When configuring our application to read from secondary (replica) nodes, we get eventual consistency by default. This means a secondary node might not yet have the latest write operation from the primary due to replication lag.
To solve this we can define Read Concern and Write Concern, to enforce higher levels of consistency. That comes with performance cost.
Other way to solve above for us was to use primaryPreferred. By default, reads and writes to the primary node in a replica set are strongly consistent. Updates to a single document are always atomic.
Comments
Post a Comment