Menu

Schedule of CPD:Actuarial Data Scientist Program: Fourth edition - Module 2 (16 CPD)

CPD:Actuarial Data Scientist Program: Fourth edition - Module 2 (16 CPD)

Schedule of CPD:Actuarial Data Scientist Program: Fourth edition - Module 2 (16 CPD)

Day 1, Thursday, 6 February
16:00 - 18:00 Decision Trees in classification and regression (Part I)
Day 2, Thursday, 13 February
16:00 - 18:00 Programming : Basics of regression and classification trees in Python
Day 3, Thursday, 20 February
16:00 - 18:00 Decision Trees in classification and regression (Part II)
Day 4, Tuesday, 11 March
16:00 - 18:00 Programming : From simple regression and classification trees to ensembles of trees (bagging and random forests)
Day 5, Monday, 24 March
16:00 - 18:00 Theory Boosted and Bagged ensembles
Day 6, Thursday, 27 March
16:00 - 18:00 Programming: Stochastic gradient boosting machines and XGBoost
Day 7, Tuesday, 1 April
16:00 - 18:00 Neural Networks and deep learning overview
Day 8, Thursday, 10 April
16:00 - 18:00 Programming : Deep Learning
Day 9, Wednesday, 30 April
18:00 - 23:59 Assignment after Module 2
  1. From 16:00 to 18:00

    Decision Trees in classification and regression (Part I)

    By TRUFIN Julien

    This module gradually introduces classification and regression trees up to competition winning ensemble methods for regression problems.

    Participants will explore how ensembles of decision trees achieve superior performance and learn how to calibrate them in practice.

    Part 1 : Classification and regression trees :

    • Binary regression trees
    • Right sized trees
    • Measure of performance
    • Relative importance of features
    • Interactions
    • Limitations of trees
  2. From 16:00 to 18:00

    Programming : Basics of regression and classification trees in Python

    By VAN DAM Daniel, VAN ES Raymond
    • Fit a first example of a regression tree using Python’s [scikit-learn: sklearn.tree] on a simulated (toy) data set for regression. The goal here is to gain understanding of the control parameters, the loss function, and the output. We’ll work from a stump to a very deep tree.
    • Tuning of the parameters: k-fold cross validation, minimal cv error and one standard error rule in Python’s [scikit-learn: sklearn.model_selection].
    • Acquire an (empirical) understanding of the (high) variability of a constructed tree by repeating the above steps on an alternatively simulated input data.
    • Repeat the above steps on a classification problem with a simulated (toy) data set. Discuss loss functions, appropriate measures.
    • Build a regression tree for the MTPL claim frequency and severity data: focus on loss functions available in Python’s [scikit-learn: e.g. Poisson available, gamma not], discuss limitations and sketch possible alternatives.
  3. From 16:00 to 18:00

    Decision Trees in classification and regression (Part II)

    By TRUFIN Julien

    This module gradually introduces classification and regression trees up to competition winning ensemble methods for regression problems.

    Participants will explore how ensembles of decision trees achieve superior performance and learn how to calibrate them in practice.

     Part 2 : Bagging trees and random forests :

    • Bagging trees : bias, variance and expected generalization error
    • Random forests
    • Interpretability : relative importances and partial dependence plots
  4. From 16:00 to 18:00

    Programming : From simple regression and classification trees to ensembles of trees (bagging and random forests)

    By VAN DAM Daniel, VAN ES Raymond
    • First discussion of some basic interpretation tools: variable importance plot and partial dependence plot using Python’s [scikit-learn: sklearn.inspection].
    • Demonstrate bagging on bootstrapped toy data sets (for regression and classification): fit deep trees using Python’s [scikit-learn: sklearn.tree] and average predictions. Then continue with bagging done properly using Python’s [scikit-learn: sklearn.ensemble Bagging] (on the toy data sets introduced in previous session). Tuning of parameters, construction of predictions.
    • Same for random forests using Python’s [scikit-learn: sklearn.ensemble RandomForest]. Tuning of parameters, construction of predictions.
    • Build a random forest on the Ames Housing Data set and extract first insights.
    • Discuss the loss functions available for bagging + random forest on claim frequency and claim severity; comparison of tools [eg H2O].
  5. From 16:00 to 18:00

    Theory Boosted and Bagged ensembles

    By TRUFIN Julien

    This module gradually introduces classification and regression trees up to competition winning ensemble methods for regression problems.

    Participants will explore how ensembles of decision trees achieve superior performance and learn how to calibrate them in practice.

    Part 3 : Boosting trees :

    • Boosting trees
    • Gradient boosting trees
    • Regularization and randomness
    • Interpretability : relative importances, partial dependence plots and Friedman's H-statistics
  6. From 16:00 to 18:00

    Programming: Stochastic gradient boosting machines and XGBoost

    By VAN DAM Daniel, VAN ES Raymond
    • Basics of fitting (stochastic) gradient boosting machines with Python’s [scikit-learn: sklearn.ensemble GradientBoosting] and XGBoost: discussion of control parameters, outline of tuning process.
    • To illustrate first principles we work again on the toy data sets for regression and classification introduced in previous sessions.
    • Claim frequency and severity modelling with GBMs in Python’s [scikit-learn: sklearn.ensemble GradientBoosting]: tuning, variable importance, PDPs, predictions and construction of technical tariff.
  7. From 16:00 to 18:00

    Neural Networks and deep learning overview

    By ANTONIO Katrien
    • de-mystify neural networks in light of increasing literature on the use of neural nets in actuarial science
    • develop foundations of working with (different types of) neural networks
    • focus on the use of neural networks for the analysis of claim frequency + severity data, also in combination withGLMs or tree-based ML models.

  8. From 18:00 to 23:59

    Assignment after Module 2

    By VAN DAM Daniel, VAN ES Raymond

    Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.

    Deadline handing in assignment 2 : 29 April 2023.

Register

Prices

Ticket type Price
Members IA (France) € 960.00
Members IA|BE € 960.00
Members ILAC € 960.00
Others € 1,200.00