Sklearn - Birch | Model Item

Sklearn - Birch

Implements the Birch clustering algorithm. It is a memory-efficient, online-learning algorithm provided as an alternative to MiniBatchKMeans. It constructs a tree data structure with the cluster centroids being read off the leaf. These can be either the final cluster centroids or can be provided as input to another clustering algorithm such as AgglomerativeClustering.

sklearnclusteringbirch

Contributor(s)

Initial contribute: 2021-01-07

Classification(s)

●

Method-focused categories

Data-perspective

Geoinformation analysis

Detailed Description

English

Quoted from:https://scikit-learn.org/stable/modules/clustering.html#birch

The Birch builds a tree called the Clustering Feature Tree (CFT) for the given data. The data is essentially lossy compressed to a set of Clustering Feature nodes (CF Nodes). The CF Nodes have a number of subclusters called Clustering Feature subclusters (CF Subclusters) and these CF Subclusters located in the non-terminal CF Nodes can have CF Nodes as children.

The CF Subclusters hold the necessary information for clustering which prevents the need to hold the entire input data in memory. This information includes:

Number of samples in a subcluster.
Linear Sum - An n-dimensional vector holding the sum of all samples
Squared Sum - Sum of the squared L2 norm of all samples.
Centroids - To avoid recalculation linear sum / n_samples.
Squared norm of the centroids.

The Birch algorithm has two parameters, the threshold and the branching factor. The branching factor limits the number of subclusters in a node and the threshold limits the distance between the entering sample and the existing subclusters.

This algorithm can be viewed as an instance or data reduction method, since it reduces the input data to a set of subclusters which are obtained directly from the leaves of the CFT. This reduced data can be further processed by feeding it into a global clusterer. This global clusterer can be set by n_clusters. If n_clusters is set to None, the subclusters from the leaves are directly read off, otherwise a global clustering step labels these subclusters into global clusters (labels) and the samples are mapped to the global label of the nearest subcluster.

Algorithm description:

A new sample is inserted into the root of the CF Tree which is a CF Node. It is then merged with the subcluster of the root, that has the smallest radius after merging, constrained by the threshold and branching factor conditions. If the subcluster has any child node, then this is done repeatedly till it reaches a leaf. After finding the nearest subcluster in the leaf, the properties of this subcluster and the parent subclusters are recursively updated.
If the radius of the subcluster obtained by merging the new sample and the nearest subcluster is greater than the square of the threshold and if the number of subclusters is greater than the branching factor, then a space is temporarily allocated to this new sample. The two farthest subclusters are taken and the subclusters are divided into two groups on the basis of the distance between these subclusters.
If this split node has a parent subcluster and there is room for a new subcluster, then the parent is split into two. If there is no room, then this node is again split into two and the process is continued recursively, till it reaches the root.

Birch or MiniBatchKMeans?

Birch does not scale very well to high dimensional data. As a rule of thumb if n_features is greater than twenty, it is generally better to use MiniBatchKMeans.
If the number of instances of data needs to be reduced, or if one wants a large number of subclusters either as a preprocessing step or otherwise, Birch is more useful than MiniBatchKMeans.

How to use partial_fit?

To avoid the computation of global clustering, for every call of partial_fit the user is advised

To set n_clusters=None initially
Train all data by multiple calls to partial_fit.
Set n_clusters to a required value using brc.set_params(n_clusters=n_clusters).
Call partial_fit finally with no arguments, i.e. brc.partial_fit() which performs the global clustering.

References:

Tian Zhang, Raghu Ramakrishnan, Maron Livny BIRCH: An efficient data clustering method for large databases. https://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf

Roberto Perdisci JBirch - Java implementation of BIRCH clustering algorithm https://code.google.com/archive/p/jbirch

{{htmlJSON.ComputableModelList}} 0

{{htmlJSON.ConceptualschematicModelList}} 0

{{htmlJSON.LogicalschematicModelList}} 0

{{htmlJSON.ModelItem}} 0

Author {{curRelation.author.join('; ')}}

Journal {{curRelation.journal}}

{{htmlJSON.DataItem}} 0

Data Hub 0

There is no related data hub. You can link data hubs.

Data Method 0

There is no related data method. You can link data methods.

{{htmlJSON.Reference}} 0

{{htmlJSON.Material}} 0

模型元数据

Jie Song (2021). Sklearn - Birch, Model Item, OpenGMS, https://geomodeling.njnu.edu.cn/modelItem/f359c3f4-6db8-4edb-bc1a-3f4e1f902f71

Copyright and Disclaimer

All copyrights of a material (model, data, article, etc.) in the OpenGMS fully belong to its author/developer/designer (or any other wording about the owner). The OpenGMS takes every care to avoid copyright infringement, contributor(s) should carefully employ materials from other sources and give proper citations.

Contributor(s)

Initial contribute : 2021-01-07

History

Last modifier:: Jie Song
Last modify time:: 2021-01-12
Modify times:: 2 View History

QR Code

Author {{curRelation.author.join('; ')}}

Journal {{curRelation.journal}}

{{htmlJSON.LinkResourceFromRepositoryOrCreate}}{{htmlJSON.create}}.

Drop the file here, orclick to upload.

Select From My Space

+ add

Alias

+ {{htmlJSON.Add}}

{{htmlJSON.ModelName}}:

* 名称

别名

系列名

* 版本号

* 目的

* 修改内容

* 创建/修改日期

* 作者

* 摘要

详细描述

+ 添加关键字

* 时间参考系

* 空间参考系类型

* 空间参考系名称

* 起始日期

终止日期

* 进展

* 开发者

* 是否开源

* 访问方式

* 使用方式

* 开源协议

* 传输方式

* 获取地址

* 发布日期

* 发布者

* 编号

* 目的

* 修改内容

* 创建/修改日期

* 作者

* 时间分辨率

* 时间尺度

* 时间步长

* 时间范围

* 空间维度

* 格网类型

* 空间分辨率

* 空间尺度

* 空间范围

* 类型

图例

* 名称

* 描述

示例描述

* 名称

* 类型

* 值/链接

或

上传

Title	Author	Date	Journal	Volume(Issue)	Pages	Links	Doi	Operation

{{htmlJSON.GetByDoi}} :

Authors: {{articleUploading.authors[0]}}, {{articleUploading.authors[1]}}, {{articleUploading.authors[2]}}, et al.

Journal: {{articleUploading.journal}}

Date: {{articleUploading.date}}

Page range: {{articleUploading.pageRange}}

Link: {{articleUploading.link}}

DOI: {{articleUploading.doi}}

The article {{articleUploading.title}} has been uploaded yet.

Sklearn - Birch

Contributor(s)

Initial contribute: 2021-01-07

Classification(s)

Detailed Description

{{htmlJSON.ModelContentService}}

{{htmlJSON.noComputableModel}}

{{htmlJSON.NoRelatedConceptual}}

{{htmlJSON.NoRelatedLogical}}

{{htmlJSON.RelatedModelsData}}

{{htmlJSON.NoRelatedModel}}

{{htmlJSON.noRelatedData}}

There is no related data hub. You can link data hubs.

There is no related data method. You can link data methods.

{{htmlJSON.RelatedKnowledge}}

{{htmlJSON.noRelatedReference}}

{{htmlJSON.NoRelatedMmaterial}}

模型元数据

{{htmlJSON.HowtoCite}}

Copyright and Disclaimer

Contributor(s)

Initial contribute : 2021-01-07

{{htmlJSON.CoContributor}}

History

QR Code

{{articleUploading.title}}

OpenGMS Systems

Online Tools

About

Contact

OpenGMS Systems

Online Tools

About

Contact

Open Geographic Modeling and Simulation

Authorship

NEW

{{articleUploading.title}}

No content to show

You have select {{multipleSelection.length+multipleSelectionMyData.length}} data .

NEW

Comment(s)