Menu

Schedule of Actuarial Data Scientist Program : First edition - Module 3

Actuarial Data Scientist Program : First edition - Module 3

Schedule of Module 3 : Advanced topics in Machine Learning

Day 1, Tuesday, 19 April
16:00 - 18:00 Global & Local interpretation of Machine Learning models. By HAINAUT Donatien
Day 2, Tuesday, 26 April
16:00 - 18:00 Programming : Interpretability tools
Day 3, Tuesday, 3 May
16:00 - 18:00 Variable selection and model agnostic methods By VAN OIRBEEK Robin
Day 4, Tuesday, 10 May
16:00 - 18:00 Programming : Putting models in production (Part I)
Day 5, Tuesday, 17 May
16:00 - 18:00 Causality and Ethics in Machine Learning
Day 6, Tuesday, 24 May
16:00 - 18:00 Programming: Putting models in production (Part II)
Day 7, Tuesday, 31 May
16:00 - 18:00 New advances in Machine Learning
Day 8, Tuesday, 7 June
16:00 - 18:00 Wrap-up & Case study on ethics and fairness
  1. From 16:00 to 18:00

    Global & Local interpretation of Machine Learning models.

    By HAINAUT Donatien

    The aim of this session is to cover the local and global approaches to interpret output of a ML algorithm. We will focus on:

    • Partial dependence plots
    • Permutation feature importance
    • Friedman’s interactions
    • Global surrogate models
    • Local Interpretable Model-Agnostic explanations (LIME)
    • Shapley’s value (SHAP)
  2. From 16:00 to 18:00

    Programming : Interpretability tools

    • We work with a tuned machine learning method (eg GBM) on the MTPL data set for claim frequency.
    • We cover global model agnostic methods, such as variable importance, partial dependence, individual conditional expectation, accumulated local effects, interaction strengths; (the precise set will need to be discussed with the teacher of Module 3 in function of what he will cover in the theoretical session) [scikit-learn, ALEPython, Alibi, Lucid, InterpretML, etc. depending on tool selection].
    • We cover global surrogate models, e.g., fitting a global surrogate tree.
    • We cover local model agnostic methods, e.g., LIME with Python’s [lime] and SHAP with Python’s [shap].
    • Enrich traditional actuarial learning models (e.g. GLMs) with insights obtained by using these tools to peak under the hood of a machine learning method (e.g. GBM).
    • Optional additional topic Boruta (NB: 1h extra would be required if we want to introduce this specific topic) : introduction to Boruta variable selection method in Python.
  3. From 16:00 to 18:00

    Variable selection and model agnostic methods

    By VAN OIRBEEK Robin
    • At the surface, variable selection seems like a very straightforward and easy-to-understand concept, however, there is (much) more to it than meets the eye. That’s why we will start this session by clearly defining what variable selection is all about while facing the inherent ambiguity of the concept heads-on!
    • Next, the usual suspects will be covered. This entails forward/backward/stepwise selection and penalized regression methods (mainly LASSO) but also how variable selection practically is hardwired into widely used ML models such as (ensemble) tree models and deep learning.
    • The final part of the session will be about some more involved methods like Boruta and the Genetic Algorithm. For the latter method, a convenient adaptation of the well-known concordance probability (specifically tailored to the needs of insurance data) will be presented, with a special focus on its use in variable selection.
  4. From 16:00 to 18:00

    Programming : Putting models in production (Part I)

    Participants will be reminded that a data science pipe-line is not just about modelling but contains several key step

    • Problem definition
    • Data Management
    • Modelling
    • Deployment
    • Monitoring

    As well as more transversal elements like communication and governance.

    The focus of these sessions will therefore be on the phase after modelling: deployment.

    The precise content of these two sessions dedicated to “model in production” will be further adjusted in function of the topics that would be covered or introduced during the last theoretical sessions. We were thinking to start with the following topics:

    • Dashboarding in Python: quick introduction to Dash
    • From your own Python notebook to an API: first ideas and basics.
    • Building an API for and MTPL claim frequency model using Flask.

     

  5. From 16:00 to 18:00

    Causality and Ethics in Machine Learning

    "Correlation does not imply causality" --> what can we do about it. Lindholm 2020 on non discriminatory pricing is a good application of Ethics dimension.

    By Prof. Vincent Ginis

    Prof. Vincent Ginis

    Vrije Universiteit Brussel
    Harvard University

     

  6. From 16:00 to 18:00

    Programming: Putting models in production (Part II)

    The field of MLOps: streamlining the process of developing and deploying ML models. We could introduce some frameworks for production, like

    • Version control with git and GitHub: issues management, timeline, todo’s,…
    • Docker
    • Jenkins
    • Apache Airflow

    And also present the advantages of end-to-end platform (e.g. Dataiku) when building a robust data science pipe-line.

  7. From 16:00 to 18:00

    New advances in Machine Learning

    Planning of the session:

    16h00 - 16h30: Reinforcement learning by Kevin Mets (UAntwerpen)

    Description

    Reinforcement learning is concerned with learning what to do: how to map situations to actions. It is one of the three main machine learning paradigms, next to supervised and unsupervised learning. Learning occurs through interaction with an environment, instead of relying on labelled input/output pairs as in supervised learning.

    In this session we will focus on:

    • Positioning reinforcement learning with respect to supervised (i.e., regression and classification) and unsupervised learning (e.g., clustering).
    • The basic elements (e.g., agent, environment, policies, value functions) and algorithms (e.g., Q-learning) of reinforcement learning.
    • A brief introduction to deep reinforcement learning.
    • An example application of reinforcement learning.
    • Practical recommractical recommendations on how to learn more and start applying reinforcement learning.

    16h30 - 17h00: Causal machine learning by Wouter Verbeke (KULeuven)

    Description

    In this presentation, we aim to provide a clear understanding: 

    • of what causal machine learning is
    • how it differs from traditional machine learning
    • what the key challenges and current solutions are
    • tthe use and advantages of causal machine learning by discussing on a practical case will be illustrated.

    17h00 - 17h10 : Break

    17h10 - 18h00 : “Let’s ‘Investigate’ ! A reality walk through the actuarial universe : belief functions, expert judgments, computer science, model theory,…  within a convergence to ‘unification’ through Category Theory“ by Pierre Ars and Jan Vanderspiegel

    Description

    • « An actuary is a business professional who deals with the measurement and management of risk and uncertainty .» (definition of wikipedia)
    •  The current AI & data science revolution has consequent impacts on the actuarial job, increasing the part of ‘uncertainty’, adding a lot of ‘noise’, asking for appropriated scientific tools extending the classical « probabilistic mapping ».
    •  More important, the subject of « model theory » needs to be tackled in a way encompassing all the other components of the actuarial universe : expert judgments, computer science,...
    • « Category theory » seems to be the ideal TOOL allowing this « unification ».
  8. From 16:00 to 18:00

    Wrap-up & Case study on ethics and fairness

    Important comment: The assignment is not a strict examination. Its purpose is to apply the concepts learned during the previous sessions.

    • Introduce the Module 3 assignment, inspired by the Lindholm et al. (2020) paper on Discrimination-free insurance pricing.

    The examples in the paper cover both GLMs and neural nets and are thus a good recap of what we learned in previous sessions.

    • Construct best estimate / unawareness / discrimination free prices and as such demonstrate the (proxy) effect of sensitive or protected variables (e.g. gender).
    • Discuss explorative tools to check fairness, bias: this will once again to be discussed with the teacher of Module 3 to ensure consistency.
    • Wrap-up the case as a final Module 3 assignment.

Register

Sorry, the registration period is over.

Prices

Ticket type Price
Members (Per Module) € 1,000.00
Members (Price/Module in case of participating in the whole program of the 3 modules) € 800.00
Non-members (Per Module) € 1,250.00
Non-members (Price/Module in case of participating in the whole program of the 3 modules) € 1,000.00