Quoted from:https://scikit-learn.org/stable/modules/clustering.html#affinity-propagation
AffinityPropagation
creates clusters by sending messages between pairs of samples until convergence. A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples. The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs. This updating happens iteratively until convergence, at which point the final exemplars are chosen, and hence the final clustering is given.
Affinity Propagation can be interesting as it chooses the number of clusters based on the data provided. For this purpose, the two important parameters are the preference, which controls how many exemplars are used, and the damping factor which damps the responsibility and availability messages to avoid numerical oscillations when updating these messages.
The main drawback of Affinity Propagation is its complexity. The algorithm has a time complexity of the order O(N**2T) O(N2T), where N N is the number of samples and TT is the number of iterations until convergence. Further, the memory complexity is of the order O(N**2)O(N2) if a dense similarity matrix is used, but reducible if a sparse similarity matrix is used. This makes Affinity Propagation most appropriate for small to medium sized datasets.
Examples:
Algorithm description: The messages sent between points belong to one of two categories. The first is the responsibility r(i,k)r(i,k), which is the accumulated evidence that sample k should be the exemplar for sample i should be the exemplar for sample i. The second is the availability a(i,k) which is the accumulated evidence that sample i should choose sample k to be its exemplar, and considers the values for all other samples that k should be an exemplar. In this way, exemplars are chosen by samples if they are (1) similar enough to many samples and (2) chosen by many samples to be representative of themselves.
More formally, the responsibility of a sample k to be the exemplar of sample i is given by:
Where s(i,k) is the similarity between samples i and k. The availability of sample k to be the exemplar of sample i is given by:
To begin with, all values for rr and a are set to zero, and the calculation of each iterates until convergence. As discussed above, in order to avoid numerical oscillations when updating the messages, the damping factor λ
is introduced to iteration process:
where t indicates the iteration times.