This sounds like a multi-class classification problem.
You should have a data set of the variables you listed, with one record per individual. Each row should have the red/yellow/green designation.
If the rows are not assigned a red/yellow/green status you'll have to create one. In that case you should create a regression model that gives you probability of repayment from 0-100% or 0-1. You will assign red/yellow/green risk as probability bins: 0-50% = red, 50=70% = yellow, 70-100% = green.
You can adjust the bin cutoffs to suit your appetite for risk. These are just my example. A real application would look at repayment patterns and set the bins based on those.
Your job is to divide the data into three parts: train/validation/test.
Create a model using your technique of choice (e.g. neural network, random forest, XGBoost, etc) by training it on the train set. Tune and validate your models on the validation set. Once you have a model, give it the test set and compare its predictions to the known test set results.
You might try creating multiple models to see how they do. Sometimes blending several models gives better results than just one.
You need to trade off bias and variance for each model.
发布评论
评论(1)
“公式”?
没有这样的事情。
这听起来像是多类分类问题。
您应该拥有列出的变量的数据集,每个人都有一个记录。每行应具有红色/黄色/绿色名称。
如果未分配行红色/黄色/绿色状态,则必须创建一个状态。在这种情况下,您应该创建一个回归模型,使您的还款概率为0-100%或0-1。您将把红色/黄色/绿色风险分配为概率箱:0-50%=红色,50 = 70%=黄色,70-100%=绿色。
您可以调整垃圾箱的截止,以适合您的风险胃口。这些只是我的榜样。真实的应用程序将查看还款模式,并根据这些模式设置垃圾箱。
您的工作是将数据分为三个部分:火车/验证/测试。
使用您选择的技术(例如神经网络,随机森林,XGBOOST等)创建模型,通过在火车组中进行训练。在验证集中调整并验证您的模型。有了模型后,将其进行测试集并将其预测与已知的测试集结果进行比较。
您可能会尝试创建多个模型以查看它们的表现。有时融合几种型号不仅仅是一种型号。
您需要为每个模型进行偏见和差异。
祝你好运!
"Formula"?
No such thing.
This sounds like a multi-class classification problem.
You should have a data set of the variables you listed, with one record per individual. Each row should have the red/yellow/green designation.
If the rows are not assigned a red/yellow/green status you'll have to create one. In that case you should create a regression model that gives you probability of repayment from 0-100% or 0-1. You will assign red/yellow/green risk as probability bins: 0-50% = red, 50=70% = yellow, 70-100% = green.
You can adjust the bin cutoffs to suit your appetite for risk. These are just my example. A real application would look at repayment patterns and set the bins based on those.
Your job is to divide the data into three parts: train/validation/test.
Create a model using your technique of choice (e.g. neural network, random forest, XGBoost, etc) by training it on the train set. Tune and validate your models on the validation set. Once you have a model, give it the test set and compare its predictions to the known test set results.
You might try creating multiple models to see how they do. Sometimes blending several models gives better results than just one.
You need to trade off bias and variance for each model.
Good luck!