Warranty Claim Process Overview
For a simple warranty claim, a dealer will submit the claim, Polaris will review it, make a decision to authorize or deny it, and the claim process ends. For more complex cases, there may be a back and forth between the dealer and Polaris, which can take time depending on the dealer’s responsiveness and vehicle part availability. For this reason, the decision was made to evaluate and score warranty claim records every day that the claim is modified in the system to identify safety cases as soon as possible (see process diagram below).
Data Prep for Modeling
The R-based predictive tools rely upon categorical fields as inputs to the model. All fields that contain nulls are assigned a value in order for the model to run.
The inherent difficulty with identifying which cases should be reviewed for further product safety consideration is that it’s a true needle-in-the-haystack scenario. To overcome this, we over sample on the safety flag field (indicating a potential safety record) to create a data set with predictive power. Without over sampling, we would end up building a model that predicts that every warranty claim should not be reviewed, and it would nearly achieve 100% accuracy.
Developing Early Life and Late Life Models
Due to the varying degrees of record completeness when scoring a warranty claim that was just created versus scoring a warranty claim that was just completed, we developed two different types of predictive models. The Early Life model uses fields that a dealer adds as a warranty claim is created, and the Late Life model uses fields added throughout the process or at a point of claim completion. This gives us the ability to more accurately identify potential safety scenarios at different points in the claim lifecycle.
Iteration upon Variable Selection
While building out the models, we used the built-in variable importance plots to understand the relative importance each variable has in contributing to the overall predictive power. Dozens of iterations were run to determine which variables have significant predictive capability and to understand the best performing mix of variables. The mix of variables and the total number of variables included varies by model for each of the four models deployed. In the example below for one of the Early Life models, you can see that four variables are being used to predict whether a warranty claim is safety-related, and the case type code field is the main driver in the model.
Model Accuracy Testing
Testing the accuracy of the models goes along with feature selection and the variable importance plot. Not only do we want to understand which variables are predictive, but we also want to test the model against data the model hasn’t seen before.
In our scenario, we want to pick a probability threshold for each model that will minimize the number of false negatives. This is a deviation from choosing the model that is the “most” accurate, as we want to ensure we don’t miss safety-related claims at the expense of reviewing a higher number of warranty claims that aren’t safety-related, or false positives. In the example below, you can see that setting the probability threshold to 50% results in reviewing 55 warranty claims that aren’t safety-related, but the model has accurately caught all safety cases in the validation data set.
Launching Models in Production
Each business day, the workflow builds new models using historical records to continuously learn from a dynamic data set. All newly created and modified warranty claims are scored against each of the four models, and records that meet or exceed a designated probability threshold for the model are manually reviewed.
Alteryx takes the records that need to be reviewed and sends an email alert to the Polaris safety team to decide whether the warranty claim is truly safety-related. One of the safety reviewers can then click through the case links in the email below to begin reviewing the case in further detail. Furthermore, any records that are flagged for safety will become part of the data set used to re-create models in future workflow runs.