RemitRix and The Actuary Magazine Webinar
In recent years, Machine Learning (ML) based algorithms have become popular in almost every industry, providing more accurate solutions to various problems, be it customers’ churn predictions, users’ segmentation or risk assessments. Whilst there are several reasons for this recent embracement of ML, the main reason must be that it simply works! The combination of the company’s own data and profound domain knowledge, with great computational power and years of research, enables the ML algorithms to give a tailor maid solution to almost any business question presented to the machine.
The insurance industry holds massive amounts of data which are not fully utilised to assess the actuarial risks. Accurately estimating the probability of each risk factor to occur, in each sub population or region, enables the insurance company to better estimate the risk it is exposed to, and to better allocate its funds and capital.
Which ML model is best used for prediction of mortality?
There’s no definitive answer as it depends on the data you have available. Generally, for a regression problem (and for classification problems too), if you don’t have hundreds or thousands of explanatory variables, I’d start with an ensemble method. Random forest and GBM are very popular so it’s easier to start with them because you’ll find a lot of implementations, documentation, and papers that will help you to get started. RemitRix can help you with this.
I know that traditional statistical models like the Cox model can be used to predict days until next occurrence of an event, say failure etc. i.e Survival analysis. My question is how can regression version of machine learning models like GBM, Neural networks etc be used to predict days until an occurrence of an event?
For GBM it’s just as you would do using Cox model – the GBM actually learns F(x) which best explains the target. It is true you don’t get an explicit closed form of this F function, but most implementations of GBM enable you to give a trained model an observation and get the desired prediction. In the case of Neural networks, I’ve never used them for survival analysis.
Do you see machine learning getting more widespread in other actuarial areas such as pensions?
I hope so. I think that there are a lot of actuarial areas where it could be beneficial. And I think that since both ML and Actuarial are heavily based on statistic and probability, actuaries can definitely understand the algorithms and how they can be used for a wide range of actuarial purposes – indeed RemitRix sees a very broad opportunity for its use throughout the profession.
How communicable are these models to the different stakeholders?
This is a very general question. There are so many ML models, and some of them are very complex and are probably harder to explain in one line to stakeholders, but if we take decision trees, for example, I think they are very intuitive so most people can easily grasp the idea. The idea behind ensemble models is also pretty straightforward (especially if you are not getting into details about the implementation itself or the loss function etc.) so I don’t expect many problems there either. Deep learning models are much more complex so they may be more of a challenge, but on the other hand, those are so heavily used on Facebook, Google, Amazon etc. that I think that a lot of people have heard of them or experience first hand what they can do, and therefore trust them (“If it works for Google and Amazon, it can work for me”)
In order for an actuary to work with ML (as a tool) what kind of studies / professional training needs?
As a machine learning researcher I think what you usually need is understanding of code writing (R or Python are a good start), some understanding of complexity of calculations for computers (because you need your algorithm to converge in reasonable time) and of course knowledge of statistics, probability, optimization and some mathematics which (I hope) actuaries already have. However, I would suggest companies like RemitRix can help by providing the expertise in ML, so actuaries don’t need to and can focus on their day job.
Since you knew the design of the training data in advance is this not a classic example of data-snooping?
No, because I didn’t feed it to the model in any way. In addition, we are expanding the testing to include more generally simulated data from independent sources and using real data, as well.
How much is technical knowledge required on the customer's side to enter into the technical discussions/negotiations?
I don’t think that a lot of technical knowledge at all. I think you have the domain knowledge for the insurance world and your company – have an understanding of how are things being done today, where would you most want to test machine learning, what kind of data you have available (or you can make available easily), what results do you achieve today… At RemitRix we’re in discussion with insurers all the time so we make the whole engagement process as simple for you as we can.
What are the pros & cons of machine learning approach vs Cox Proportional Hazards regression & Kaplan-Meier models etc?
The main advantages of GBM on Cox, in my opinion, are:
1. That it doesn’t assume any specific relationship between the covariates and the target
2. Depending on the loss function you choose to use, it can also avoid the proportional hazard assumption.
Since we know that assumptions are sometimes violated – the less assumptions – the better. In addition, if you suspect interactions between covariates you don’t have to explicitly insert them into the model, it can pick it up on its own, if they indeed exist.
As for the KM models – if you want anything but one curve for the entire population, you have to identify yourself which groups should be estimated separately, meaning you have to perform long and detailed research, checking hypothesis of different groups. This is a lot of trial and error, and as a human, you can only do so much of it before you run out of time or patience. GBM does all that exploration for you, finds the different groups and then estimate the curve of each subgroup just like KM would. At RemitRix we’re doing this all the time so we can help you to fast track your preferred approach.
Do you see any promising applications of ML outside the insurance industry (e.g. pension, investment)?
Yes. I think that it’s not a coincidence that so many industries and companies started using machine learning in recent years. Those algorithms are powerful, flexible and they bring results. I don’t see any reason the insurance industry will be any different. I think it may be a bit of a struggle to get the first few models or application in the door, but once companies start using them and see an increase in accuracy – they will want to expand the use. At Remitrix our roadmap looks at lots of areas ML can help and we’re exploring these with insurers all the time.
Can we avoid machine learning in actuarial practice?
I think you can, but I don’t think you should. If it can perform better than what is currently used – why avoid it? If we still don’t know if it is better – why not check it out? If it performs exactly the same, but does a bigger part of the process automatically, hence saving actuary’s time – that’s also great. Even if it’s just one more tool to help you, for example by pointing you to the interesting covariates in the data – that’s also quite valuable. ML really is about improving knowledge…so why wouldn’t you want to use it?
what programming language would you recommend for actuaries wishing to move into machine learning?
I would start with either R or Python. They are both commonly used among machine learners (R is a bit more common among statisticians whereas python is more common among people with a background in computer science). Both of them are open source so you have a lot of independent packages you can explore, and even add your own. However, we would advocate using experts like RemitRix with deep ML expertise as this allows actuaries to benefit form ML without the need for spending lots of time learning.
Could you use machine learning to predict a company's pay gap?
If you have data that explains pay gap, and if you have historical data of pay gaps, and if you have a reason to believe that the historical data is representative of the future – then yes, definitely something that can be tested. At RemitRix we are always looking for practical applications of Machine Learning.
What is the minimum viable dataset size?
Quite a few participants have asked questions regarding the required size of the data set, and there is no single simple answer. It depends on the complexity of the data, the number of covariates, the types of covariates and how small are the subgroups you’re trying to find. With data and Machine learning, it’s usually said that the more the better, but obviously, if you have a simple data, with only 3 possible explanatory variables and they are all binary – you’re not going to need the same amount of data as if your data has 150 covariates, some categorical with 10-20 categories.
If I have to give a number, I’d say 100K is a lower bound, but again, if the data is very simple or the structure of it is a straightforward linear relationship between the target and two covariates – you can do with much less.
What are the differences between GLM and GBM models in insurance pricing? Can a GBM model produce coefficients?
GBM is based on decision trees and does not require you to define the relationship between the covariates and the target beforehand. GLMs while not assuming linearity, do require you to specify the link function which defines the relationship between covariates and target. GBM does not produce coefficients, but you can get predictions to new observations, just as you would in GLM, and you can get a measure of the importance of the variable which is (to some degree) a substitute of the significance of a coefficient
Where do PCA and SVD fit into machine learning? I hear it is used for recommendations with Amazon.
Those two are unsupervised algorithms used mainly for dimensionality reduction.
How can an independent reviewer challenge/validate ML models and results?
Whenever you test models you have to make sure you do it with data that was not used in the training process so there’s no overfitting of the training set. If you have a dataset with labels you can easily evaluate using measures such as accuracy, precision, and recall. At RemitRix we offer clients the opportunity to test data alongside their existing processes to compare results – and to demonstrate the value ML offers in a practical way.
What is the dimension of the input covariates?
In our simulation, we had 10 covariates, 3 of which continuous, 4 binary and the rest are categorical. When we transformed the data to OHE we ended up with ~190 covariates
Is it possible to substitute actuary by ML algorithms?
Don’t think it can substitute completely, but it can definitely improve the work of the actuary, shorten his time doing some of the work and improve his results. We see ML working as a tool to complement the actuaries profession not a replacement for it.
How long does it take for a qualified actuary to learn ML and start using it in day-to-day work? (In life insurance)
They would need to learn R, Python, which take a bout 2 weeks, then get experience arranging the data for that environment and get insight on a few basic algorithms. So I’d estimate 6 months? That should yield some basic level ML work. For professionality and complex problems, obviously the sky is the limit, but that’s where RemitRix can help as we can deliver results to a significantly detailed level now, not in 6 months to a basic level.
How much computing power does GBM modeling usually require?
Depends on the number of iterations, the size of the data and which implementation are you using. I did all of the analysis you saw today on my laptop, which is relatively strong, but not a crazy big Amazon machine.
How can we guard against machine learning producing socially unacceptable outcomes? E.g. we are not allowed to discriminate on the basis of gender.
Just like we need to guard when we use other methods. The most obvious thing is not to insert the modeling process covariates like race. Also, you need to make sure you’re not using covariates that are highly correlated with unacceptable covariates. But that’s exactly what you’d have to do if you use Cox model or linear regression. And in case a human is doing the job manually – we are even more at risk, in my opinion, as the bias can be hidden or subconscious.
What is a good starting point (preferably open source) to get into machine learning?
I’d start with Andrew NG’s RA course (https://www.coursera.org/learn/machine-learning)
What is the difference between decision trees and random forests?
Random forest is an ensemble method which combines many decision trees together (hence the name “forest”). It’s similar to GBM in the idea, but the aggregation of the decision trees is done differently
In which software (or package) have you fitted the GBM model?
Is the true curve is generated from a probability distribution with no noise? If so it is hardly reflective of true life.
It did have some noise, it’s possible that it was not sufficient to simulate real life, this is the reason I emphasized the importance of working with real data, which we are/will be doing shortly, and will be announcing the results of soon!
Regarding curves on the slide "simple simulation results": are the plots based on training data set or validation set? Have you checked whether the models overfit?
Validation data only. And yes, we definitely looked out for overfitting.
Which tools do you suggest to implement ML algorithms? (For example Excel, SAS, R, Python,...)
R and Python are the most common ones, and since they are open source they are also more frequently updated. Excel has very limited ability to deal with very big datasets so I wouldn’t use it. We’ve not really come across SAS in this context.
In the context of applying machine learning to Solvency II, has the regulator expressed any concern or challenged this approach which can be seen as a black box or brute force?
This will be a challenge. ML has areas/methods (e.g. neural networks) that for the time being are difficult, if not impossible to explain as to how they reach their conclusions and these methods will certainly for now not meet the regulatory standard. Then there are areas, that can be reasonably well explained and there should less of an issue to get these types of methods and findings approved. Lastly, there is a grey area (possibly including the GBM method), which will require work to get this approved. We are confident, that regulators, as this field evolves and penetrates the actuarial field, will be open to considering explainable methods. We are already speaking to the PRA. The overall response is positive and we look forward to continuing this once we gather enough data / with other regulators.
What impact will GDPR have on machine learning?
GDPR refers to the use of PII – Personal Identifiable Information and will impact significantly on sales and marketing in particular. Data used by an Actuary to assess risk doesn’t apply.
What are the common uses within P&C insurance?
How frequently should we update our trained ML model?
This will depend on the underlying data that built the model relative to additional data that accrues, as well as the stability of the specific parameter being estimated. My guess is you want to start annually and eventually filter out the parameters that require annual vs. less frequent updating.
What are the regulation implications? How easy would it be for companies to use machine learning?
It is a learning curve for the regulator, as well. We expect them to accept this as the field evolves.
Are the apparent deficiencies perhaps down to not refining the cox model to allow for non-proportionality over time and/or interactions? This could have been done manually after a few tests.
ML is clearly superior, as it makes no assumptions. We are performing additional tests on more generally simulated data, which will be blindly tested by the ML algorithm, as well as real data. What we have demonstrated is just an example – but we do have many other scenarios we can demonstrate to clients.
What is weak learner?
It means a single tree, i.e. a more limited predictor, before applying iterative processes.