转载请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/6238029.html
最近需要详细研究下kafka reblance过程中分区计算的算法细节,网上搜了部分说法,感觉比较晦涩且不太易懂,还是自己抠源码比较简便一点。
kafka reblance计算部分代码如下:
class RangeAssignor() extends PartitionAssignor with Logging {def assign(ctx: AssignmentContext) = {val valueFactory = (topic: String) => new mutable.HashMap[TopicAndPartition, ConsumerThreadId]val partitionAssignment =new Pool[String, mutable.Map[TopicAndPartition, ConsumerThreadId]](Some(valueFactory))for (topic <- ctx.myTopicThreadIds.keySet) { val curConsumers = ctx.consumersForTopic(topic)val curPartitions: Seq[Int] = ctx.partitionsForTopic(topic)val nPartsPerConsumer = curPartitions.size / curConsumers.sizeval nConsumersWithExtraPart = curPartitions.size % curConsumers.sizeinfo("Consumer " + ctx.consumerId + " rebalancing the following partitions: " + curPartitions +" for topic " + topic + " with consumers: " + curConsumers)for (consumerThreadId <- curConsumers) { val myConsumerPosition = curConsumers.indexOf(consumerThreadId)assert(myConsumerPosition >= 0)val startPart = nPartsPerConsumer * myConsumerPosition + myConsumerPosition.min(nConsumersWithExtraPart)val nParts = nPartsPerConsumer + (if (myConsumerPosition + 1 > nConsumersWithExtraPart) 0 else 1)/*** Range-partition the sorted partitions to consumers for better locality.* The first few consumers pick up an extra partition, if any.*/if (nParts <= 0)warn("No broker partitions consumed by consumer thread " + consumerThreadId + " for topic " + topic)else {for (i <- startPart until startPart + nParts) {val partition = curPartitions(i)info(consumerThreadId + " attempting to claim partition " + partition)// record the partition ownership decisionval assignmentForConsumer = partitionAssignment.getAndMaybePut(consumerThreadId.consumer)assignmentForConsumer += (TopicAndPartition(topic, partition) -> consumerThreadId)}}}}
def getPartitionsForTopics(topics: Seq[String]): mutable.Map[String, Seq[Int]] = {getPartitionAssignmentForTopics(topics).map { topicAndPartitionMap =>val topic = topicAndPartitionMap._1val partitionMap = topicAndPartitionMap._2debug("partition assignment of /brokers/topics/%s is %s".format(topic, partitionMap)) (topic -> partitionMap.keys.toSeq.sortWith((s,t) => s < t))}}
def getConsumersPerTopic(group: String, excludeInternalTopics: Boolean) : mutable.Map[String, List[ConsumerThreadId]] = {val dirs = new ZKGroupDirs(group)val consumers = getChildrenParentMayNotExist(dirs.consumerRegistryDir)val consumersPerTopicMap = new mutable.HashMap[String, List[ConsumerThreadId]]for (consumer <- consumers) {val topicCount = TopicCount.constructTopicCount(group, consumer, this, excludeInternalTopics)for ((topic, consumerThreadIdSet) <- topicCount.getConsumerThreadIdsPerTopic) {for (consumerThreadId <- consumerThreadIdSet)consumersPerTopicMap.get(topic) match {case Some(curConsumers) => consumersPerTopicMap.put(topic, consumerThreadId :: curConsumers)case _ => consumersPerTopicMap.put(topic, List(consumerThreadId))}}}for ( (topic, consumerList) <- consumersPerTopicMap ) consumersPerTopicMap.put(topic, consumerList.sortWith((s,t) => s < t))consumersPerTopicMap}
计算过程主要由上述高亮代码部分实现,举例说明,一个拥有十个分区的topic,相同group拥有三个consumerid为aaa,ccc,bbb的消费者
1 由后两段代码可知,获取consumerid列表和partition分区列表都是已经排好序的,所以
curConsumers=(aaa,bbb,ccc)
curPartitions=(0,1,2,3,4,5,6,7,8,9)
2
nPartsPerConsumer=10/3 =3
nConsumersWithExtraPart=10%3 =1
3 假设当前客户端id为aaa
myConsumerPosition= curConsumers.indexof(aaa) =0
4 计算分区范围
startPart= 3*0+0.min(1) = 0
nParts = 3+(if (0 + 1 > 1) 0 else 1)=3+1=4
所以aaa对应的分区号为[0,4),即0,1,2,3前面四个分区
同理可得bbb对应myConsumerPosition=1,对应分区4,5,6中间三个分区
ccc对应myConsumerPosition=2,对应7,8,9最后三个分区。