Menu

Schedule of Actuarial Data Scientist Program : Second edition - Module 1

Actuarial Data Scientist Program : Second edition - Module 1

Schedule of Module 1: Foundations of Machine Learning in Actuarial Sciences

Day 1, Thursday, 3 February
16:00 - 18:00 Linear Models and conditional expectation By ANTONIO Katrien
Day 2, Thursday, 10 February
16:00 - 18:00 Programming : Foundations of actuarial learning and the organization of the training
Day 3, Thursday, 17 February
16:00 - 18:00 Generalized Linear Models: regression and classification By ANTONIO Katrien
Day 4, Thursday, 24 February
16:00 - 18:00 Programming : LMs and GLMs
Day 5, Thursday, 10 March
16:00 - 18:00 Regularisations and links with other support vector machines By ANTONIO Katrien
Day 6, Thursday, 17 March
16:00 - 18:00 Programming : Regularization
Day 7, Thursday, 24 March
16:00 - 18:00 Neural Networks By ANTONIO Katrien
Day 8, Thursday, 31 March
16:00 - 18:00 Programming : Deep Learning
18:00 - 23:59 Assignment after Module 1
  1. From 16:00 to 18:00

    Linear Models and conditional expectation

    By ANTONIO Katrien
    • Conditional mean estimation E[Y|X] and the iris problem.
    • Introduction to Classification problems in machine learning: Linear Discriminant Analysis.
    • Introduction to Regression Problem: Linear models and the OLS estimator (with mixed data types: e.g. mix of continuous and discrete data).
  2. From 16:00 to 18:00

    Programming : Foundations of actuarial learning and the organization of the training

    • Introducing the trainers and the training environment, including a first introduction to git and the GitHub repo’s dedicated to the training, the notebooks, data sets available, the ways to execute the Python code
    • Introducing the data sets that will be analyzed in the course: MTPL claim frequency and severity data, Ames Housing data set, caravan insurance data set (for a classification problem with class imbalance), data set with characteristics of vehicles (for clustering).
    • Getting to know these data sets: basic data explorations, some plotting, data manipulation and calculating summary statistics [Numpy, pandas and Matplotlib].
    • Target and feature engineering steps, including (among others) centering, scaling, dealing with NAs, class imbalance, filter out near zero variance [scikit-learn: sklearn.preprocessing].
    • Data splitting and resampling methods: training vs validation vs test, k-fold cross validation [scikit-learn: sklearn.model_selection].
    • Introduction to parameter tuning, simple example with e.g. K-nearest-neighbour. [scikit-learn: sklearn.model_selection].
  3. From 16:00 to 18:00

    Generalized Linear Models: regression and classification

    By ANTONIO Katrien
    • the GLMs (Logistic, Poisson, Gamma)
  4. From 16:00 to 18:00

    Programming : LMs and GLMs

    • Introducing the Python package statsmodels via [statsmodels.api] and the support for formulas via [statsmodels.formula.api].
    • Linear regression: describe the models, fit and summarize linear regression models on the Ames Housing data: model fit and model inspection, prediction, variable and model selection tools [statsmodels: statsmodels.regression.linear_model]
    • Generalized linear regression models: fitting Poisson and gamma regression models on the MTPL data set: inspecting model fit, building predictions, evaluating model fit [statsmodels: statsmodels.genmod.generalized_linear_model].
    • We gradually build up the Poisson GLM: introducing exposure (offset), how to handle multiple types of variables (numeric, categorical).
    • Combine frequency and severity GLMs into a technical tariff. Construct technical prices for selected risk profiles.
  5. From 16:00 to 18:00

    Regularisations and links with other support vector machines

    By ANTONIO Katrien

    Introduction of

    • the LASSO
    • Ridge
    • ElasticNet
    • Relation with Support Vector Machine
  6. From 16:00 to 18:00

    Programming : Regularization

    • Fitting regularized (G)LMs: basic set up with [statsmodels or pyglmnet], handling different types of covariates in the regularized fit, automatic feature selection.
    • Working on a classification problem with class imbalance: the caravan insurance data set. Going from regression to classification problems: model formulation, model fit (with and without regularization), model evaluation tools (e.g. AUC).
  7. From 16:00 to 18:00

    Neural Networks

    By ANTONIO Katrien
    • Introduction to Neural Networks
    • Deep Learning overview
  8. From 16:00 to 18:00

    Programming : Deep Learning

    • Toolbox in Python: [Keras API as an interface to TensorFlow], a brief introduction to the concepts of tensors and working with tensors.
    • The neural network architecture in Keras: a first example of an artificial neural network for a regression problem with the Ames Housing Data: setting up the layered architecture, specifying the activation functions, number of neurons, input and output layers.
    • Network compilation and fitting: (continue example with Ames Housing Data) choice of a loss function and forward pass, optimizers and metrics, tune model parameters (with epochs and batches).
    • ANNs for claim frequencies and severities: Poisson GLM with only an intercept ‘replicated’ with an ANN, incorporating exposure in a Poisson ANN, adding input features and their preprocessing, …
    • As additional material: a notebook introducing CANNs (combined actuarial neural networks, see e.g. Schelldorfer & Wuthrich, 2019): combining GLMs and neural networks via the use of a skip connection. The ANN becomes a correction on top of a base prediction delivered by a GLM. Implementation in Python.
  9. From 18:00 to 23:59

    Assignment after Module 1

    Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.

    Participants will either work on a regression or a classification problem, using data sets previously not covered in the sessions (though with a similar structure, e.g. claim frequency and severity data).

    Participants will deliver a notebook / report with the following items:

    • An exploratory data analysis.
    • The construction of both a claim frequency and a claim severity model, using two predictive modeling techniques covered so far (e.g. a GLM, a regularized GLM, a neural net or a combined actuarial neural net).
    • A discussion of insights obtained from these models (e.g. important features).
    • A comparison of the performance of the constructed models, based on an own defined set of criteria (inspired by the tools covered so far in the program).