Schedule of Module 1: Foundations of Machine Learning in Actuarial Sciences
Day 1, Thursday, 7 November | |
16:00 - 18:00 | Linear Models and conditional estimation By ANTONIO Katrien |
Day 2, Thursday, 14 November | |
16:00 - 18:00 | Programming : Foundations of actuarial learning and the organization of the training By VAN DAM Daniel, VAN ES Raymond |
Day 3, Thursday, 21 November | |
16:00 - 18:00 | Generalized Linear Models By ANTONIO Katrien |
Day 4, Thursday, 28 November | |
16:00 - 18:00 | Programming : LMs and GLMs |
Day 5, Thursday, 5 December | |
16:00 - 18:00 | Regularisations and links with other support vector machines By ANTONIO Katrien |
Day 6, Thursday, 12 December | |
16:00 - 18:00 | Programming : Regularisation |
Day 7, Monday, 16 December | |
16:00 - 18:00 | Clustering methods By HAINAUT Donatien |
Day 8, Thursday, 19 December | |
16:00 - 18:00 | Programming : Clustering |
Day 9, Friday, 31 January | |
18:00 - 23:59 | Assignment after Module 1 |
-
- Conditional mean estimation E[Y|X] and the iris problem.
- Introduction to Classification problems in machine learning: Linear Discriminant Analysis.
- Introduction to Regression Problem: Linear models and the OLS estimator (with mixed data types: e.g. mix of continuous and discrete data).
-
From 16:00 to 18:00
Programming : Foundations of actuarial learning and the organization of the training
By VAN DAM Daniel, VAN ES Raymond- Introducing the trainers and the training environment, including a first introduction to git and the GitHub repo’s dedicated to the training, the notebooks, data sets available, the ways to execute the Python code
- Introducing the data sets that will be analyzed in the course: MTPL claim frequency and severity data, Ames Housing data set, caravan insurance data set (for a classification problem with class imbalance), data set with characteristics of vehicles (for clustering).
- Getting to know these data sets: basic data explorations, some plotting, data manipulation and calculating summary statistics [Numpy, pandas and Matplotlib].
- Target and feature engineering steps, including (among others) centering, scaling, dealing with NAs, class imbalance, filter out near zero variance [scikit-learn: sklearn.preprocessing].
- Data splitting and resampling methods: training vs validation vs test, k-fold cross validation [scikit-learn: sklearn.model_selection].
- Introduction to parameter tuning, simple example with e.g. K-nearest-neighbour. [scikit-learn: sklearn.model_selection].
-
- the GLMs (Logistic, Poisson, Gamma)
-
From 16:00 to 18:00
Programming : LMs and GLMs
- Introducing the Python package statsmodels via [statsmodels.api] and the support for formulas via [statsmodels.formula.api].
- Linear regression: describe the models, fit and summarize linear regression models on the Ames Housing data: model fit and model inspection, prediction, variable and model selection tools [statsmodels: statsmodels.regression.linear_model]
- Generalized linear regression models: fitting Poisson and gamma regression models on the MTPL data set: inspecting model fit, building predictions, evaluating model fit [statsmodels: statsmodels.genmod.generalized_linear_model].
- We gradually build up the Poisson GLM: introducing exposure (offset), how to handle multiple types of variables (numeric, categorical).
- Combine frequency and severity GLMs into a technical tariff. Construct technical prices for selected risk profiles.
-
Introduction of
- the LASSO
- Ridge
- ElasticNet
- Relation with Support Vector Machine
-
From 16:00 to 18:00
Programming : Regularisation
- Fitting regularized (G)LMs: basic set up with [statsmodels or pyglmnet], handling different types of covariates in the regularized fit, automatic feature selection.
- Working on a classification problem with class imbalance: the caravan insurance data set. Going from regression to classification problems: model formulation, model fit (with and without regularization), model evaluation tools (e.g. AUC).
-
From 18:00 to 23:59
Assignment after Module 1
Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.
Deadline handing in assignment : 20 January 2023.