Menu

Schedule of Actuarial Data Scientist Program : Module 1

Actuarial Data Scientist Program : Module 1

Schedule of Module 1: Foundations of Machine Learning in Actuarial Sciences

Day 1, Tuesday, 5 October
16:00 - 18:00 Linear Models and conditional expectation By ANTONIO Katrien
Day 2, Tuesday, 12 October
16:00 - 18:00 Programming : Foundations of actuarial learning and the organization of the training
Day 3, Tuesday, 19 October
16:00 - 18:00 Generalized Linear Models: regression and classification By ANTONIO Katrien
Day 4, Tuesday, 26 October
16:00 - 18:00 Programming : LMs and GLMs
Day 5, Tuesday, 9 November
16:00 - 18:00 Regularisations and links with other support vector machines By ANTONIO Katrien
Day 6, Tuesday, 16 November
16:00 - 18:00 Programming : Regularization
Day 7, Tuesday, 23 November
16:00 - 18:00 Neural Networks By ANTONIO Katrien
Day 8, Tuesday, 30 November
16:00 - 18:00 Programming : Deep Learning
18:00 - 23:59 Assignment after Module 1
  1. From 16:00 to 18:00

    Linear Models and conditional expectation

    By ANTONIO Katrien
    • Conditional mean estimation E[Y|X] and the iris problem.
    • Introduction to Classification problems in machine learning: Linear Discriminant Analysis.
    • Introduction to Regression Problem: Linear models and the OLS estimator (with mixed data types: e.g. mix of continuous and discrete data).
  2. From 16:00 to 18:00

    Programming : Foundations of actuarial learning and the organization of the training

    • Introducing the trainers and the training environment, including a first introduction to git and the GitHub repo’s dedicated to the training, the notebooks, data sets available, the ways to execute the Python code
    • Introducing the data sets that will be analyzed in the course: MTPL claim frequency and severity data, Ames Housing data set, caravan insurance data set (for a classification problem with class imbalance), data set with characteristics of vehicles (for clustering).
    • Getting to know these data sets: basic data explorations, some plotting, data manipulation and calculating summary statistics [Numpy, pandas and Matplotlib].
    • Target and feature engineering steps, including (among others) centering, scaling, dealing with NAs, class imbalance, filter out near zero variance [scikit-learn: sklearn.preprocessing].
    • Data splitting and resampling methods: training vs validation vs test, k-fold cross validation [scikit-learn: sklearn.model_selection].
    • Introduction to parameter tuning, simple example with e.g. K-nearest-neighbour. [scikit-learn: sklearn.model_selection].
  3. From 16:00 to 18:00

    Generalized Linear Models: regression and classification

    By ANTONIO Katrien
    • the GLMs (Logistic, Poisson, Gamma)
  4. From 16:00 to 18:00

    Programming : LMs and GLMs

    • Introducing the Python package statsmodels via [statsmodels.api] and the support for formulas via [statsmodels.formula.api].
    • Linear regression: describe the models, fit and summarize linear regression models on the Ames Housing data: model fit and model inspection, prediction, variable and model selection tools [statsmodels: statsmodels.regression.linear_model]
    • Generalized linear regression models: fitting Poisson and gamma regression models on the MTPL data set: inspecting model fit, building predictions, evaluating model fit [statsmodels: statsmodels.genmod.generalized_linear_model].
    • We gradually build up the Poisson GLM: introducing exposure (offset), how to handle multiple types of variables (numeric, categorical).
    • Combine frequency and severity GLMs into a technical tariff. Construct technical prices for selected risk profiles.
  5. From 16:00 to 18:00

    Regularisations and links with other support vector machines

    By ANTONIO Katrien

    Introduction of

    • the LASSO
    • Ridge
    • ElasticNet
    • Relation with Support Vector Machine
  6. From 16:00 to 18:00

    Programming : Regularization

    • Fitting regularized (G)LMs: basic set up with [statsmodels or pyglmnet], handling different types of covariates in the regularized fit, automatic feature selection.
    • Working on a classification problem with class imbalance: the caravan insurance data set. Going from regression to classification problems: model formulation, model fit (with and without regularization), model evaluation tools (e.g. AUC).
  7. From 16:00 to 18:00

    Neural Networks

    By ANTONIO Katrien
    • Introduction to Neural Networks
    • Deep Learning overview
  8. From 16:00 to 18:00

    Programming : Deep Learning

    • Toolbox in Python: [Keras API as an interface to TensorFlow], a brief introduction to the concepts of tensors and working with tensors.
    • The neural network architecture in Keras: a first example of an artificial neural network for a regression problem with the Ames Housing Data: setting up the layered architecture, specifying the activation functions, number of neurons, input and output layers.
    • Network compilation and fitting: (continue example with Ames Housing Data) choice of a loss function and forward pass, optimizers and metrics, tune model parameters (with epochs and batches).
    • ANNs for claim frequencies and severities: Poisson GLM with only an intercept ‘replicated’ with an ANN, incorporating exposure in a Poisson ANN, adding input features and their preprocessing, …
    • As additional material: a notebook introducing CANNs (combined actuarial neural networks, see e.g. Schelldorfer & Wuthrich, 2019): combining GLMs and neural networks via the use of a skip connection. The ANN becomes a correction on top of a base prediction delivered by a GLM. Implementation in Python.
  9. From 18:00 to 23:59

    Assignment after Module 1

    Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.

    Participants will either work on a regression or a classification problem, using data sets previously not covered in the sessions (though with a similar structure, e.g. claim frequency and severity data).

    Participants will deliver a notebook / report with the following items:

    • An exploratory data analysis.
    • The construction of both a claim frequency and a claim severity model, using two predictive modeling techniques covered so far (e.g. a GLM, a regularized GLM, a neural net or a combined actuarial neural net).
    • A discussion of insights obtained from these models (e.g. important features).
    • A comparison of the performance of the constructed models, based on an own defined set of criteria (inspired by the tools covered so far in the program).

Register

Sorry, the registration period is over.

Prices

Ticket type Price
Members (Per Module) € 1,000.00
Members (Price/Module in case of participating in the whole program of the 3 modules) € 800.00
Non-members (Per Module) € 1,250.00
Non-members (Price/Module in case of participating in the whole program of the 3 modules) € 1,000.00