Explain how Gradient Boosting works? What are some of the pros and cons?
☞ Gradient Boosting is an ensemble machine learning technique used for both regression and classification problems. It builds the model in a stage-wise fashion, like other boosting methods, but it generalizes them by allowing optimization of an arbitrary differentiable loss function.
- PREDICT THE RESIDUALS YO
- learning rate: prevents overfitting (scaling of the new tree, as in the values of the leaves of the new tree)
Ensemble
- Combines output from multiple models to make a single decision (by weighting or majority decision)
Regression/Classification
- Regression: predicting a continuous value (mean)
- Classification: predicting a discrete category (mode)
Boosting
- A strategy where each new model (in an ensemble) tries to improve on the errors of the previous ones
Arbitrary differentiable loss function
- The loss function can have a flexible definition as long as it is differentiable
- Gradient Boosting can be customized to optimize (minimize) different types of errors depending on what’s important for the specific problem you’re trying to solve.
Walk through this.

<br
(QX) How is AdaBoost different to Gradient Boosting?
https://youtu.be/3CC4N4z3GJc?si=5yIur9zmlB74uFPe