Schedule of Module 1: Foundations of Machine Learning in Actuarial Sciences
Day 1, Thursday, 3 February | |
16:00 - 18:00 | Linear Models and conditional expectation By ANTONIO Katrien |
Day 2, Thursday, 10 February | |
16:00 - 18:00 | Programming : Foundations of actuarial learning and the organization of the training |
Day 3, Thursday, 17 February | |
16:00 - 18:00 | Generalized Linear Models: regression and classification By ANTONIO Katrien |
Day 4, Thursday, 24 February | |
16:00 - 18:00 | Programming : LMs and GLMs |
Day 5, Thursday, 10 March | |
16:00 - 18:00 | Regularisations and links with other support vector machines By ANTONIO Katrien |
Day 6, Thursday, 17 March | |
16:00 - 18:00 | Programming : Regularization |
Day 7, Thursday, 24 March | |
16:00 - 18:00 | Neural Networks By ANTONIO Katrien |
Day 8, Thursday, 31 March | |
16:00 - 18:00 | Programming : Deep Learning |
18:00 - 23:59 | Assignment after Module 1 |
-
- Conditional mean estimation E[Y|X] and the iris problem.
- Introduction to Classification problems in machine learning: Linear Discriminant Analysis.
- Introduction to Regression Problem: Linear models and the OLS estimator (with mixed data types: e.g. mix of continuous and discrete data).
-
From 16:00 to 18:00
Programming : Foundations of actuarial learning and the organization of the training
- Introducing the trainers and the training environment, including a first introduction to git and the GitHub repo’s dedicated to the training, the notebooks, data sets available, the ways to execute the Python code
- Introducing the data sets that will be analyzed in the course: MTPL claim frequency and severity data, Ames Housing data set, caravan insurance data set (for a classification problem with class imbalance), data set with characteristics of vehicles (for clustering).
- Getting to know these data sets: basic data explorations, some plotting, data manipulation and calculating summary statistics [Numpy, pandas and Matplotlib].
- Target and feature engineering steps, including (among others) centering, scaling, dealing with NAs, class imbalance, filter out near zero variance [scikit-learn: sklearn.preprocessing].
- Data splitting and resampling methods: training vs validation vs test, k-fold cross validation [scikit-learn: sklearn.model_selection].
- Introduction to parameter tuning, simple example with e.g. K-nearest-neighbour. [scikit-learn: sklearn.model_selection].
-
- the GLMs (Logistic, Poisson, Gamma)
-
From 16:00 to 18:00
Programming : LMs and GLMs
- Introducing the Python package statsmodels via [statsmodels.api] and the support for formulas via [statsmodels.formula.api].
- Linear regression: describe the models, fit and summarize linear regression models on the Ames Housing data: model fit and model inspection, prediction, variable and model selection tools [statsmodels: statsmodels.regression.linear_model]
- Generalized linear regression models: fitting Poisson and gamma regression models on the MTPL data set: inspecting model fit, building predictions, evaluating model fit [statsmodels: statsmodels.genmod.generalized_linear_model].
- We gradually build up the Poisson GLM: introducing exposure (offset), how to handle multiple types of variables (numeric, categorical).
- Combine frequency and severity GLMs into a technical tariff. Construct technical prices for selected risk profiles.
-
Introduction of
- the LASSO
- Ridge
- ElasticNet
- Relation with Support Vector Machine
-
From 16:00 to 18:00
Programming : Regularization
- Fitting regularized (G)LMs: basic set up with [statsmodels or pyglmnet], handling different types of covariates in the regularized fit, automatic feature selection.
- Working on a classification problem with class imbalance: the caravan insurance data set. Going from regression to classification problems: model formulation, model fit (with and without regularization), model evaluation tools (e.g. AUC).
-
- Introduction to Neural Networks
- Deep Learning overview
-
From 16:00 to 18:00
Programming : Deep Learning
- Toolbox in Python: [Keras API as an interface to TensorFlow], a brief introduction to the concepts of tensors and working with tensors.
- The neural network architecture in Keras: a first example of an artificial neural network for a regression problem with the Ames Housing Data: setting up the layered architecture, specifying the activation functions, number of neurons, input and output layers.
- Network compilation and fitting: (continue example with Ames Housing Data) choice of a loss function and forward pass, optimizers and metrics, tune model parameters (with epochs and batches).
- ANNs for claim frequencies and severities: Poisson GLM with only an intercept ‘replicated’ with an ANN, incorporating exposure in a Poisson ANN, adding input features and their preprocessing, …
- As additional material: a notebook introducing CANNs (combined actuarial neural networks, see e.g. Schelldorfer & Wuthrich, 2019): combining GLMs and neural networks via the use of a skip connection. The ANN becomes a correction on top of a base prediction delivered by a GLM. Implementation in Python.
-
From 18:00 to 23:59
Assignment after Module 1
Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.
Participants will either work on a regression or a classification problem, using data sets previously not covered in the sessions (though with a similar structure, e.g. claim frequency and severity data).
Participants will deliver a notebook / report with the following items:
- An exploratory data analysis.
- The construction of both a claim frequency and a claim severity model, using two predictive modeling techniques covered so far (e.g. a GLM, a regularized GLM, a neural net or a combined actuarial neural net).
- A discussion of insights obtained from these models (e.g. important features).
- A comparison of the performance of the constructed models, based on an own defined set of criteria (inspired by the tools covered so far in the program).