为每个分层测试拆分生成和组装模型预测
我想使用分层KFOLD(SKF)生成多个测试数据拆分,然后使用SKLEARNN模型为这些测试数据拆分(以及所有数据)生成/组装预测。我愿意以编程方式进行此操作。
我已经使用下面的最小数据示例重新捕获了我的代码。简而言之,(在数据导入之后),我具有一个拟合模型并生成模型预测概率的函数。随后,我尝试将此功能传递给数据的每个SKF分开,以生成并整理我数据的每一行的预测概率。但是,此步骤失败并生成一个值(预期的布尔数组)。我的代码如下:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
#load data, assemble dataframe
iris = datasets.load_iris()
X = pd.DataFrame(iris.data[51:150, :], columns = ["sepal_length", "sepal_width",
"petal_length", "petal_width"])
y = pd.DataFrame(iris.target[51:150,], columns = ["target"])
df = pd.concat([X,y], axis = 1)
#instantiate logistic regression
log = LogisticRegression()
#modelling function
def train_model(train, test, fold):
X = df.drop("target", axis = 1)
y = df["target"]
X_train = train[X]
y_train = train[y]
X_test = test[X]
y_test = test[y]
#generate probability of class 1 predictions from logistic regression model fit
prob = log.fit(X_train, y_train).predict_proba(X_test)[:, 1]
return (prob)
#generate straified k-fold splits (2 used as example here)
skf = StratifiedKFold(n_splits = 2)
#generate and collate all predictions (for each row in df)
fold = 1
outputs = []
for train_index, test_index in skf.split(df, y):
train_df = df.loc[train_index,:]
test_df = df.loc[test_index,:]
output = train_model(train_df,test_df,fold) #generate model probabilities for X_test
in skf split
outputs.append(output) #append all model probabilities
fold = fold + 1
all_preds = pd.concat(outputs)
有人可以指导我进入包含行索引及其预测概率的解决方案吗?
I would like to generate multiple test data splits using stratified KFold (skf) and then generate/assemble predictions for each of these test data splits (and hence all of the data) using a sklearn model. I am at a wits end on how to do this programmatically.
I have recaptured my code using a minimal data example below. Briefly, (after data import), I have a function that does the model fit and generates model predicted probabilities. Subsequently, I attempt to pass this function to each skf split of my data so as to generate and subsequently collate predicted probabilities for each row of my data. However, this step fails and generates a valueerror (boolean array expected). My code follows below:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
#load data, assemble dataframe
iris = datasets.load_iris()
X = pd.DataFrame(iris.data[51:150, :], columns = ["sepal_length", "sepal_width",
"petal_length", "petal_width"])
y = pd.DataFrame(iris.target[51:150,], columns = ["target"])
df = pd.concat([X,y], axis = 1)
#instantiate logistic regression
log = LogisticRegression()
#modelling function
def train_model(train, test, fold):
X = df.drop("target", axis = 1)
y = df["target"]
X_train = train[X]
y_train = train[y]
X_test = test[X]
y_test = test[y]
#generate probability of class 1 predictions from logistic regression model fit
prob = log.fit(X_train, y_train).predict_proba(X_test)[:, 1]
return (prob)
#generate straified k-fold splits (2 used as example here)
skf = StratifiedKFold(n_splits = 2)
#generate and collate all predictions (for each row in df)
fold = 1
outputs = []
for train_index, test_index in skf.split(df, y):
train_df = df.loc[train_index,:]
test_df = df.loc[test_index,:]
output = train_model(train_df,test_df,fold) #generate model probabilities for X_test
in skf split
outputs.append(output) #append all model probabilities
fold = fold + 1
all_preds = pd.concat(outputs)
Can somebody please guide me to the solution that includes row index and its predicted probability?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论