AdaBoost

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier.

Machine learningEnsemble learningBoosting

Contributor(s)

Initial contribute: 2020-12-16

Classification(s)

●

Method-focused categories

Data-perspective

Intelligent computation analysis

Detailed Description

English

Quoted from: http://image.diku.dk/imagecanon/material/cortes_vapnik95.pdf

Overview

Problems in machine learning often suffer from the curse of dimensionality — each sample may consist of a huge number of potential features (for instance, there can be 162,336 Haar features, as used by the Viola–Jones object detection framework, in a 24×24 pixel image window), and evaluating every feature can reduce not only the speed of classifier training and execution, but in fact reduce predictive power. Unlike neural networks and SVMs, the AdaBoost training process selects only those features known to improve the predictive power of the model, reducing dimensionality and potentially improving execution time as irrelevant features don't need to be computed.

Training

AdaBoost refers to a particular method of training a boosted classifier. A boost classifier is a classifier in the form

F_T(x) = \sum_{t=1}^T f_t(x)\,\!

where each $f_{t}$ is a weak learner that takes an object $x$ as input and returns a value indicating the class of the object. For example, in the two-class problem, the sign of the weak learner output identifies the predicted object class and the absolute value gives the confidence in that classification. Similarly, the $T$ th classifier is positive if the sample is in a positive class and negative otherwise.

Each weak learner produces an output hypothesis, $h(x_i)$ , for each sample in the training set. At each iteration $t$ , a weak learner is selected and assigned a coefficient $\alpha_t$ such that the sum training error $E_{t}$ of the resulting $t$ -stage boost classifier is minimized.

E_t = \sum_i E[F_{t-1}(x_i) + \alpha_t h(x_i)]

Here $F_{t-1}(x)$ is the boosted classifier that has been built up to the previous stage of training, $E(F)$ is some error function and $f_t(x) = \alpha_t h(x)$ is the weak learner that is being considered for addition to the final classifier.

Weighting

At each iteration of the training process, a weight $w_{i,t}$ is assigned to each sample in the training set equal to the current error $E(F_{t-1}(x_i))$ on that sample. These weights can be used to inform the training of the weak learner, for instance, decision trees can be grown that favor splitting sets of samples with high weights.

Derivation

This derivation follows Rojas (2009):

Suppose we have a data set $\{(x_{1},y_{1}),\ldots ,(x_{N},y_{N})\}$ where each item $x_{i}$ has an associated class $y_{i}\in \{-1,1\}$ , and a set of weak classifiers $\{k_{1},\ldots ,k_{L}\}$ each of which outputs a classification $k_{j}(x_{i})\in \{-1,1\}$ for each item. After the $(m-1)$ -th iteration our boosted classifier is a linear combination of the weak classifiers of the form:

C_{{(m-1)}}(x_{i})=\alpha _{1}k_{1}(x_{i})+\cdots +\alpha _{{m-1}}k_{{m-1}}(x_{i})

Where the class will be the sign of $C_{(m-1)}(x_{i})$ . At the $m$ -th iteration we want to extend this to a better boosted classifier by adding another weak classifier $k_{m}$ , with another weight $\alpha _{m}$ :

C_{{m}}(x_{i})=C_{{(m-1)}}(x_{i})+\alpha _{m}k_{m}(x_{i})

So it remains to determine which weak classifier is the best choice for $k_{m}$ , and what its weight $\alpha _{m}$ should be. We define the total error $E$ of $C_m$ as the sum of its exponential loss on each data point, given as follows:

E=\sum _{i=1}^{N}e^{-y_{i}C_{m}(x_{i})}=\sum _{i=1}^{N}e^{-y_{i}C_{(m-1)}(x_{i})}e^{-y_{i}\alpha _{m}k_{m}(x_{i})}

Letting $w_{i}^{{(1)}}=1$ and $w_{i}^{{(m)}}=e^{{-y_{i}C_{{m-1}}(x_{i})}}$ for $m>1$ , we have:

E=\sum _{{i=1}}^{N}w_{i}^{{(m)}}e^{{-y_{i}\alpha _{m}k_{m}(x_{i})}}

We can split this summation between those data points that are correctly classified by $k_{m}$ (so $y_{i}k_{m}(x_{i})=1$ ) and those that are misclassified (so $y_{i}k_{m}(x_{i})=-1$ ):

E=\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}e^{-\alpha _{m}}+\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}e^{\alpha _{m}}

=\sum _{i=1}^{N}w_{i}^{(m)}e^{-\alpha _{m}}+\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}(e^{\alpha _{m}}-e^{-\alpha _{m}})

Since the only part of the right-hand side of this equation that depends on $k_{m}$ is $\sum _{{y_{i}\neq k_{m}(x_{i})}}w_{i}^{{(m)}}$ , we see that the $k_{m}$ that minimizes $E$ is the one that minimizes $\sum _{{y_{i}\neq k_{m}(x_{i})}}w_{i}^{{(m)}}$ [assuming that $\alpha _{m}>0$ ], i.e. the weak classifier with the lowest weighted error (with weights $w_{i}^{{(m)}}=e^{{-y_{i}C_{{m-1}}(x_{i})}}$ ).

To determine the desired weight $\alpha _{m}$ that minimizes $E$ with the $k_{m}$ that we just determined, we differentiate:

{\frac {dE}{d\alpha _{m}}}={\frac {d(\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}e^{-\alpha _{m}}+\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}e^{\alpha _{m}})}{d\alpha _{m}}}

Setting this to zero and solving for $\alpha _{m}$ yields:

\alpha_m = \frac{1}{2}\ln\left(\frac{\sum_{y_i = k_m(x_i)} w_i^{(m)}}{\sum_{y_i \neq k_m(x_i)} w_i^{(m)}}\right)

Proof —

{\frac {dE}{d\alpha _{m}}}=-\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}e^{-\alpha _{m}}+\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}e^{\alpha _{m}}=0

because $e^{-\alpha _{m}}$ does not depend on $i$

e^{-\alpha _{m}}\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}=e^{\alpha _{m}}\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}

-\alpha _{m}+\log \left(\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}\right)=\alpha _{m}+\log \left(\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}\right)

-2\alpha _{m}=\log \left({\dfrac {\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}}{\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}}}\right)

\alpha _{m}=-{\dfrac {1}{2}}\log \left({\dfrac {\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}}{\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}}}\right)

\alpha _{m}={\dfrac {1}{2}}\log \left({\dfrac {\sum _{y_{i}=k_{m}(x_{i})}w_{i}^{(m)}}{\sum _{y_{i}\neq k_{m}(x_{i})}w_{i}^{(m)}}}\right)

We calculate the weighted error rate of the weak classifier to be $\epsilon _{m}=\sum _{{y_{i}\neq k_{m}(x_{i})}}w_{i}^{{(m)}}/\sum _{{i=1}}^{N}w_{i}^{{(m)}}$ , so it follows that:

\alpha_m = \frac{1}{2}\ln\left( \frac{1 - \epsilon_m}{\epsilon_m}\right)

which is the negative logit function multiplied by 0.5.

Thus we have derived the AdaBoost algorithm: At each iteration, choose the classifier $k_{m}$ , which minimizes the total weighted error $\sum _{{y_{i}\neq k_{m}(x_{i})}}w_{i}^{{(m)}}$ , use this to calculate the error rate $\epsilon _{m}=\sum _{{y_{i}\neq k_{m}(x_{i})}}w_{i}^{{(m)}}/\sum _{{i=1}}^{N}w_{i}^{{(m)}}$ , use this to calculate the weight $\alpha_m = \frac{1}{2}\ln\left( \frac{1 - \epsilon_m}{\epsilon_m}\right)$ , and finally use this to improve the boosted classifier $C_{{m-1}}$ to $C_{{m}}=C_{{(m-1)}}+\alpha _{m}k_{m}$ .

{{htmlJSON.ComputableModelList}} 0

{{htmlJSON.ConceptualschematicModelList}} 0

{{htmlJSON.LogicalschematicModelList}} 0

{{htmlJSON.ModelItem}} 0

Author {{curRelation.author.join('; ')}}

Journal {{curRelation.journal}}

{{htmlJSON.DataItem}} 0

Data Hub 0

There is no related data hub. You can link data hubs.

Data Method 0

There is no related data method. You can link data methods.

{{htmlJSON.Reference}} 0

{{htmlJSON.Material}} 0

模型元数据

Zhen Qian (2020). AdaBoost, Model Item, OpenGMS, https://geomodeling.njnu.edu.cn/modelItem/2d89a1e6-7b17-4795-99e4-ac07167b42ee

Copyright and Disclaimer

All copyrights of a material (model, data, article, etc.) in the OpenGMS fully belong to its author/developer/designer (or any other wording about the owner). The OpenGMS takes every care to avoid copyright infringement, contributor(s) should carefully employ materials from other sources and give proper citations.

Contributor(s)

Initial contribute : 2020-12-16

QR Code

Author {{curRelation.author.join('; ')}}

Journal {{curRelation.journal}}

{{htmlJSON.LinkResourceFromRepositoryOrCreate}}{{htmlJSON.create}}.

Drop the file here, orclick to upload.

Select From My Space

+ add

Alias

+ {{htmlJSON.Add}}

{{htmlJSON.ModelName}}:

* 名称

别名

系列名

* 版本号

* 目的

* 修改内容

* 创建/修改日期

* 作者

* 摘要

详细描述

+ 添加关键字

* 时间参考系

* 空间参考系类型

* 空间参考系名称

* 起始日期

终止日期

* 进展

* 开发者

* 是否开源

* 访问方式

* 使用方式

* 开源协议

* 传输方式

* 获取地址

* 发布日期

* 发布者

* 编号

* 目的

* 修改内容

* 创建/修改日期

* 作者

* 时间分辨率

* 时间尺度

* 时间步长

* 时间范围

* 空间维度

* 格网类型

* 空间分辨率

* 空间尺度

* 空间范围

* 类型

图例

* 名称

* 描述

示例描述

* 名称

* 类型

* 值/链接

或

上传

Title	Author	Date	Journal	Volume(Issue)	Pages	Links	Doi	Operation

{{htmlJSON.GetByDoi}} :

Authors: {{articleUploading.authors[0]}}, {{articleUploading.authors[1]}}, {{articleUploading.authors[2]}}, et al.

Journal: {{articleUploading.journal}}

Date: {{articleUploading.date}}

Page range: {{articleUploading.pageRange}}

Link: {{articleUploading.link}}

DOI: {{articleUploading.doi}}

The article {{articleUploading.title}} has been uploaded yet.

AdaBoost

Contributor(s)

Initial contribute: 2020-12-16

Classification(s)

Detailed Description

Overview

Training

Weighting

Derivation

{{htmlJSON.ModelContentService}}

{{htmlJSON.noComputableModel}}

{{htmlJSON.NoRelatedConceptual}}

{{htmlJSON.NoRelatedLogical}}

{{htmlJSON.RelatedModelsData}}

{{htmlJSON.NoRelatedModel}}

{{htmlJSON.noRelatedData}}

There is no related data hub. You can link data hubs.

There is no related data method. You can link data methods.

{{htmlJSON.RelatedKnowledge}}

{{htmlJSON.noRelatedReference}}

{{htmlJSON.NoRelatedMmaterial}}

模型元数据

{{htmlJSON.HowtoCite}}

Copyright and Disclaimer

Contributor(s)

Initial contribute : 2020-12-16

{{htmlJSON.CoContributor}}

QR Code

{{articleUploading.title}}

OpenGMS Systems

Online Tools

About

Contact

OpenGMS Systems

Online Tools

About

Contact

Open Geographic Modeling and Simulation

Authorship

NEW

{{articleUploading.title}}

No content to show

You have select {{multipleSelection.length+multipleSelectionMyData.length}} data .

NEW

Comment(s)