Guidewire TL
Follow Us



Feeding the Beast: Three Data Value Considerations

Insurance carriers have traditionally used predictive models for pricing, using credit-based insurance scores, and general linear model or GLM-based pricing plans to determine rates. But now, this sophistication has expanded into virtually all aspects of our business—from creating underwriting and claims workflows, developing marketing programs, establishing pay-plan options, and even deciding what underwriting reports to buy. 

These models are ravenous for data. The more data we feed them, the better the results, so carriers are looking outside their walls for more sources of data to fuel their models.  At the same time, expense pressure is pushing them to cut costs wherever possible. 

For the carrier, these needs are in opposition. Ask, how much data can you afford to buy (or not buy), and, more importantly, how can you best make that decision?  Let’s examine the factors to consider in making data purchasing decisions and some best practices for forming cost-benefit analyses to help drive these decisions.

Data cost is usually pretty obvious. There is a price for either access to data (subscription-based), or per data item purchased (transaction). Consider data value. First, think about data completeness, generally measured in terms of hit rate: For a given inquiry, how often am I going to get information back?   Second is accuracy. When I get a hit back, how certain am I that it is correct?  This is generally measured in terms of either false positives (I received a result, but it was not for this inquiry subject), or false negatives (I show no result for this inquiry subject, but there actually was one).  Third is compliance. How certain am I that the data I am purchasing meets the compliance requirements for my needs? Is it governed by FCRA, DPPA, or another regulation?

Considering Data Completeness

A hit on a certain data requirement clearly has value; if it didn’t we would be using the data in our models or pricing. When evaluating a data source, hit rate is one of the key drivers of the buying decision.  The critical metric to have in mind when considering hit rate is: What is the value of a hit?

Having a dollar figure predetermined for the value of a hit will allow you to make an informed decision about whether a more expensive data source with a better hit rate provides a better return than a less expensive alternative with a lower hit rate. An upshot to this metric is the cost of a no-hit, which is not necessarily the same for every data application. For a credit-based insurance score, this is the cost of missed segmentation on the price for a risk; and for an accident history or MVR, a no-hit could be hundreds of dollars in lost premium. Again, establishing these metrics up front will allow a robust CBA when considering alternate data sources.

Data Accuracy

Data accuracy is the second component in assessing data quality. This is usually measured in terms of false positives or false negatives (or false clears).  False positives can be very costly in terms of customer relations, typically occurring in driver discovery-type solutions during the underwriting process. The real cost of driver discovery is the time and effort it takes to follow up on potential leads, rather than the data cost. A false positive creates unnecessary work for underwriters or rate pursuit teams and can ultimately upset potential customers. 

In an accident history situation, a false positive can create artificial rate increases that result in lost risks (e.g. if the customer buys elsewhere), or a poor customer experience if the policyholder has to follow up to correct the error.  False positives are especially important to consider when balancing with hit rate. A high hit rate is generally preferable to a low hit rate, but not if it creates a lot of false positives. Establishing the cost associated with a false positive result allows you to include this in your CBA as well.  

False clears, or false negatives, are particularly difficult to assess. As carriers look to decrease expense loads, they consider ways to avoid ordering expensive underwriting reports, such as MVRs, whenever possible. The tradeoff here is the understanding that any predictive model one uses to decide whether or not to order an MVR (or another report) will not be right 100 percent of the time. You will miss violations.

In assessing whether such a model is worthwhile, look back to the cost of a false clear.  Typically this will be the rate load that a chargeable violation will have on the premium charged (not usually a trivial number).  False clears can be insidious because it is tempting to simply turn on a model that will reduce orders and start realizing expense savings. But, this can be misleading because you won’t really know how well your model is performing.

As a best practice in testing these kinds of predictors, you need to establish the cost of a no hit or value of a hit (in this scenario they are the same) and order 100 percent for some period of time while you test the predictive model. Then, you can determine whether the savings outweigh the potential premium loss.

Compliance Issues

With the CFPB’s arrival, it is opportune to readdress compliance. Data sources we have historically used, such as credit-based insurance scores, claims histories, and MVRs, have well-established compliance rules that should not present any surprises.  The past couple of years have introduced a lot of new data sources, such as court-based activity and event-driven triggers, however the compliance specifics surrounding them are not necessarily as well established. 

Make sure you thoroughly review the compliance concerns with not only the sources of this data, but also with how you are using the data. A new, non-FCRA data source can still be subject to FCRA compliance limitations depending on the actions you take with the data. Since you and your data vendors share a common bond in abiding by these compliance rules, make sure you are comfortable that your data vendor understands the rules around the data they are selling, have appropriate consumer disclosure processes in place to handle calls, and have guidelines in place regarding specific use cases for the data they are providing.


If you feed the beast, you can get better results. But remember the cost balance: Are you paying less or more at the end of the day just to keep going? If you look at the three dimensions of data value—completeness, accuracy and compliance—there’s nothing but upside to your CBA.

(David Lukens is director, insurance, telematics, for the risk solutions business of LexisNexis. He is responsible for telematics and mobile solutions for the auto insurance market. Since joining LexisNexis in 2010, Lukens has also led several key data and analytics initiatives, including building out solutions for identity risk management, driver discovery and policyholder retention.)

Featured articles

Guidewire MR



The Email Chat is a regular feature of the ITA Pro magazine and website. We send a series of questions to an insurance IT leader in search of thought-provoking responses on important issues facing the insurance industry.


April 5th – 7th, 2020
The Diplomat Resort
Hollywood, FL
Become a member today to receive updates –


only online

Only Online Archive

ITA Pro Buyers' Guide

Vendor Views

Partner News