Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5027 Kafka Controller Redesign
  3. KAFKA-5310

reset ControllerContext during resignation

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0.0
    • None
    • None

    Description

      This ticket is all about ControllerContext initialization and teardown. The key points are:
      1. we should teardown ControllerContext during resignation instead of waiting on election to fix it up. A heapdump shows that the former controller keeps pretty much all of its ControllerContext state laying around.
      2. we don't properly teardown/reset ControllerContext.partitionsBeingReassigned. This caused problems for us in a production cluster at linkedin as shown in the scenario below:

      > rm -rf /tmp/zookeeper/ /tmp/kafka-logs* logs*
      > ./gradlew clean jar
      > ./bin/zookeeper-server-start.sh config/zookeeper.properties
      > export LOG_DIR=logs0 && ./bin/kafka-server-start.sh config/server0.properties
      > export LOG_DIR=logs1 && ./bin/kafka-server-start.sh config/server1.properties
      > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t --replica-assignment 1
      > ./bin/zookeeper-shell.sh localhost:2181
      
      get /brokers/topics/t
      {"version":1,"partitions":{"0":[1]}}
      
      create /admin/reassign_partitions {"partitions":[{"topic":"t","partition":0,"replicas":[1,2]}],"version":1}
      Created /admin/reassign_partitions
      
      get /brokers/topics/t
      {"version":1,"partitions":{"0":[1,2]}}
      
      get /admin/reassign_partitions
      {"version":1,"partitions":[{"topic":"t","partition":0,"replicas":[1,2]}]}
      
      delete /admin/reassign_partitions
      delete /controller
      
      get /brokers/topics/t
      {"version":1,"partitions":{"0":[1,2]}}
      
      get /admin/reassign_partitions
      Node does not exist: /admin/reassign_partitions
      
      > echo '{"partitions":[{"topic":"t","partition":0,"replicas":[1]}],"version":1}' > reassignment.txt
      > ./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file reassignment.txt --execute
      
      get /brokers/topics/t
      {"version":1,"partitions":{"0":[1]}}
      
      get /admin/reassign_partitions
      Node does not exist: /admin/reassign_partitions
      
      delete /controller
      
      get /brokers/topics/t
      {"version":1,"partitions":{"0":[1,2]}}
      
      get /admin/reassign_partitions
      Node does not exist: /admin/reassign_partitions
      

      Notice that the replica set goes from [1] to [1,2] (as expected with the explicit /admin/reassign_partitions znode creation during the initial controller) back to [1] (as expected with the partition reassignment during the second controller) and again back to [1,2] after the original controller gets re-elected.

      That last transition from [1] to [1,2] is unexpected. It's due to the original controller not resetting its ControllerContext.partitionsBeingReassigned correctly. initializePartitionReassignment simply adds to what's already in ControllerContext.partitionsBeingReassigned.

      The explicit /admin/reassign_partitions znode creation is to circumvent KAFKA-5161 (95b48b157aca44beec4335e62a59f37097fe7499). Doing so is valid since:
      1. our code in production doesn't have that change
      2. KAFKA-5161 doesn't address the underlying race condition between a broker failure and the ReassignPartitionsCommand tool creating the znode.

      Attachments

        Issue Links

          Activity

            People

              onurkaraman Onur Karaman
              onurkaraman Onur Karaman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: