Schedule of Module 3 : Advanced topics in Machine Learning
Day 1, Wednesday, 19 April | |
16:00 - 18:00 | Global & Local interpretation of Machine Learning models. By HAINAUT Donatien |
Day 2, Thursday, 27 April | |
16:00 - 18:00 | Programming : Interpretability tools |
Day 3, Thursday, 25 May | |
16:00 - 18:00 | Variable selection and model agnostic methods By VAN OIRBEEK Robin |
Day 4, Thursday, 1 June | |
16:00 - 18:00 | Programming : Putting models in production (Part I) |
Day 5, Thursday, 8 June | |
16:00 - 18:00 | Causality and Ethics in Machine Learning |
Day 6, Thursday, 15 June | |
16:00 - 18:00 | Programming: Putting models in production (Part II) |
Day 7, Thursday, 22 June | |
16:00 - 18:00 | New advances in Machine Learning |
Day 8, Thursday, 29 June | |
16:00 - 18:00 | Wrap-up & Case study on ethics and fairness |
18:00 - 23:59 | Third assignment |
-
The aim of this session is to cover the local and global approaches to interpret output of a ML algorithm. We will focus on:
- Partial dependence plots
- Permutation feature importance
- Friedman’s interactions
- Global surrogate models
- Local Interpretable Model-Agnostic explanations (LIME)
- Shapley’s value (SHAP)
-
From 16:00 to 18:00
Programming : Interpretability tools
- We work with a tuned machine learning method (eg GBM) on the MTPL data set for claim frequency.
- We cover global model agnostic methods, such as variable importance, partial dependence, individual conditional expectation, accumulated local effects, interaction strengths; (the precise set will need to be discussed with the teacher of Module 3 in function of what he will cover in the theoretical session) [scikit-learn, ALEPython, Alibi, Lucid, InterpretML, etc. depending on tool selection].
- We cover global surrogate models, e.g., fitting a global surrogate tree.
- We cover local model agnostic methods, e.g., LIME with Python’s [lime] and SHAP with Python’s [shap].
- Enrich traditional actuarial learning models (e.g. GLMs) with insights obtained by using these tools to peak under the hood of a machine learning method (e.g. GBM).
- Optional additional topic Boruta (NB: 1h extra would be required if we want to introduce this specific topic) : introduction to Boruta variable selection method in Python.
-
- At the surface, variable selection seems like a very straightforward and easy-to-understand concept, however, there is (much) more to it than meets the eye. That’s why we will start this session by clearly defining what variable selection is all about while facing the inherent ambiguity of the concept heads-on!
- Next, the usual suspects will be covered. This entails forward/backward/stepwise selection and penalized regression methods (mainly LASSO) but also how variable selection practically is hardwired into widely used ML models such as (ensemble) tree models and deep learning.
- The final part of the session will be about some more involved methods like Boruta and the Genetic Algorithm. For the latter method, a convenient adaptation of the well-known concordance probability (specifically tailored to the needs of insurance data) will be presented, with a special focus on its use in variable selection.
-
From 16:00 to 18:00
Programming : Putting models in production (Part I)
Participants will be reminded that a data science pipe-line is not just about modelling but contains several key step
- Problem definition
- Data Management
- Modelling
- Deployment
- Monitoring
As well as more transversal elements like communication and governance.
The focus of these sessions will therefore be on the phase after modelling: deployment.
The precise content of these two sessions dedicated to “model in production” will be further adjusted in function of the topics that would be covered or introduced during the last theoretical sessions. We were thinking to start with the following topics:
- Dashboarding in Python: quick introduction to Dash
- From your own Python notebook to an API: first ideas and basics.
- Building an API for and MTPL claim frequency model using Flask.
-
From 16:00 to 18:00
Causality and Ethics in Machine Learning
"Correlation does not imply causality" --> what can we do about it. Lindholm 2020 on non discriminatory pricing is a good application of Ethics dimension.
By Prof. Vincent Ginis
Prof. Dr. Vincent Ginis
Vrije Universiteit Brussel
Harvard University
-
From 16:00 to 18:00
Programming: Putting models in production (Part II)
The field of MLOps: streamlining the process of developing and deploying ML models. We could introduce some frameworks for production, like
- Version control with git and GitHub: issues management, timeline, todo’s,…
- Docker
- Jenkins
- Apache Airflow
And also present the advantages of end-to-end platform (e.g. Dataiku) when building a robust data science pipe-line.
-
From 16:00 to 18:00
New advances in Machine Learning
- Reinforcement Learning by Prof. Dr. Kevin Mets, UAntwerpen
- Causal Machine Learning by Prof. Dr. ir. Wouter Verbeke, KU Leuven
-
From 16:00 to 18:00
Wrap-up & Case study on ethics and fairness
Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.
- Introduce the Module 3 assignment, inspired by the Lindholm et al. (2020) paper on Discrimination-free insurance pricing.
The examples in the paper cover both GLMs and neural nets and are thus a good recap of what we learned in previous sessions.
- Construct best estimate / unawareness / discrimination free prices and as such demonstrate the (proxy) effect of sensitive or protected variables (e.g. gender).
- Discuss explorative tools to check fairness, bias: this will once again to be discussed with the teacher of Module 3 to ensure consistency.
- Wrap-up the case as a final Module 3 assignment.
-
From 18:00 to 23:59
Third assignment
Deadline handing in assignment 3: 31 August 2023.