Reference no: EM132216250
You are working on a project to replace a fraud detection legacy system. Your team of 2 DS have access to a dataset of financial transactions comprised of 12 months of labeled historical data. The data contains several fields including the timestamp, card number, transaction location, amount, fraud score (from legacy system), and a flag indicating if the transaction was considered fraudulent or not. Your goal is to develop a predictive model that you can deploy in production with a detection rate above 70% while maintaining a low false positive rate of 1%.
How would you structure and plan the team’s work?
Based on the few fields exemplified above, which features would you suggest your team to try out?
The team has been working hard, but seems unable to go above 50% recall at 1% FPR. They can only achieve 70% recall at 1.5% FPR.
How would you handle the situation?
On a lucky strike, after the team introduced one new feature to the model, the performance bumped up to 95% recall at 1% FPR. There is now an increasingly high pressure from the project manager to deploy a model to production asap.
How would you advise the DS team?
How would you advise the project manager?