Name: Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)
Start: 2023-06-29T16:00:00+02:00
End: 2023-06-29T18:00:00+02:00

Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)

Schedule of Module 3 : Advanced topics in Machine Learning

Day 1, Wednesday, 19 April
16:00 - 18:00	Global & Local interpretation of Machine Learning models. By HAINAUT Donatien
Day 2, Thursday, 27 April
16:00 - 18:00	Programming : Interpretability tools
Day 3, Thursday, 25 May
16:00 - 18:00	Variable selection and model agnostic methods By VAN OIRBEEK Robin
Day 4, Thursday, 1 June
16:00 - 18:00	Programming : Putting models in production (Part I)
Day 5, Thursday, 8 June
16:00 - 18:00	Causality and Ethics in Machine Learning
Day 6, Thursday, 15 June
16:00 - 18:00	Programming: Putting models in production (Part II)
Day 7, Thursday, 22 June
16:00 - 18:00	New advances in Machine Learning
Day 8, Thursday, 29 June
16:00 - 18:00	Wrap-up & Case study on ethics and fairness
18:00 - 23:59	Third assignment

From 16:00 to 18:00

Global & Local interpretation of Machine Learning models.
By HAINAUT Donatien
The aim of this session is to cover the local and global approaches to interpret output of a ML algorithm. We will focus on:
- Partial dependence plots
- Permutation feature importance
- Friedman’s interactions
- Global surrogate models
- Local Interpretable Model-Agnostic explanations (LIME)
- Shapley’s value (SHAP)
From 16:00 to 18:00

Programming : Interpretability tools
- We work with a tuned machine learning method (eg GBM) on the MTPL data set for claim frequency.
- We cover global model agnostic methods, such as variable importance, partial dependence, individual conditional expectation, accumulated local effects, interaction strengths; (the precise set will need to be discussed with the teacher of Module 3 in function of what he will cover in the theoretical session) [scikit-learn, ALEPython, Alibi, Lucid, InterpretML, etc. depending on tool selection].
- We cover global surrogate models, e.g., fitting a global surrogate tree.
- We cover local model agnostic methods, e.g., LIME with Python’s [lime] and SHAP with Python’s [shap].
- Enrich traditional actuarial learning models (e.g. GLMs) with insights obtained by using these tools to peak under the hood of a machine learning method (e.g. GBM).
- Optional additional topic Boruta (NB: 1h extra would be required if we want to introduce this specific topic) : introduction to Boruta variable selection method in Python.
From 16:00 to 18:00

Variable selection and model agnostic methods
By VAN OIRBEEK Robin
- At the surface, variable selection seems like a very straightforward and easy-to-understand concept, however, there is (much) more to it than meets the eye. That’s why we will start this session by clearly defining what variable selection is all about while facing the inherent ambiguity of the concept heads-on!
- Next, the usual suspects will be covered. This entails forward/backward/stepwise selection and penalized regression methods (mainly LASSO) but also how variable selection practically is hardwired into widely used ML models such as (ensemble) tree models and deep learning.
- The final part of the session will be about some more involved methods like Boruta and the Genetic Algorithm. For the latter method, a convenient adaptation of the well-known concordance probability (specifically tailored to the needs of insurance data) will be presented, with a special focus on its use in variable selection.
From 16:00 to 18:00

Programming : Putting models in production (Part I)
Participants will be reminded that a data science pipe-line is not just about modelling but contains several key step
- Problem definition
- Data Management
- Modelling
- Deployment
- Monitoring
As well as more transversal elements like communication and governance.

The focus of these sessions will therefore be on the phase after modelling: deployment.

The precise content of these two sessions dedicated to “model in production” will be further adjusted in function of the topics that would be covered or introduced during the last theoretical sessions. We were thinking to start with the following topics:
- Dashboarding in Python: quick introduction to Dash
- From your own Python notebook to an API: first ideas and basics.
- Building an API for and MTPL claim frequency model using Flask.
From 16:00 to 18:00

Causality and Ethics in Machine Learning

"Correlation does not imply causality" --> what can we do about it. Lindholm 2020 on non discriminatory pricing is a good application of Ethics dimension.

By Prof. Vincent Ginis

Prof. Dr. Vincent Ginis

Vrije Universiteit Brussel
Harvard University
From 16:00 to 18:00

Programming: Putting models in production (Part II)
The field of MLOps: streamlining the process of developing and deploying ML models. We could introduce some frameworks for production, like
- Version control with git and GitHub: issues management, timeline, todo’s,…
- Docker
- Jenkins
- Apache Airflow
And also present the advantages of end-to-end platform (e.g. Dataiku) when building a robust data science pipe-line.
From 16:00 to 18:00

New advances in Machine Learning
- Reinforcement Learning by Prof. Dr. Kevin Mets, UAntwerpen
- Causal Machine Learning by Prof. Dr. ir. Wouter Verbeke, KU Leuven
From 16:00 to 18:00

Wrap-up & Case study on ethics and fairness
Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.
- Introduce the Module 3 assignment, inspired by the Lindholm et al. (2020) paper on Discrimination-free insurance pricing.
The examples in the paper cover both GLMs and neural nets and are thus a good recap of what we learned in previous sessions.
- Construct best estimate / unawareness / discrimination free prices and as such demonstrate the (proxy) effect of sensitive or protected variables (e.g. gender).
- Discuss explorative tools to check fairness, bias: this will once again to be discussed with the teacher of Module 3 to ensure consistency.
- Wrap-up the case as a final Module 3 assignment.
From 18:00 to 23:59

Third assignment

Deadline handing in assignment 3: 31 August 2023.

Schedule of Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)

Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)

Schedule of Module 3 : Advanced topics in Machine Learning

Global & Local interpretation of Machine Learning models.

Programming : Interpretability tools

Variable selection and model agnostic methods

Programming : Putting models in production (Part I)

Causality and Ethics in Machine Learning

Programming: Putting models in production (Part II)

New advances in Machine Learning

Wrap-up & Case study on ethics and fairness

Third assignment

Register

Shortcuts

Contact

Schedule of Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)

Actuarial Data Scientist Program: Third edition - Module 3 (16 CPD)

Schedule of Module 3 : Advanced topics in Machine Learning

Global & Local interpretation of Machine Learning models.

Programming : Interpretability tools

Variable selection and model agnostic methods

Programming : Putting models in production (Part I)

Causality and Ethics in Machine Learning

Programming: Putting models in production (Part II)

New advances in Machine Learning

Wrap-up & Case study on ethics and fairness

Third assignment

Register

Shortcuts

Contact

Cookies on this website

Your cookie settings