为每个分层测试拆分生成和组装模型预测

发布于 2025-02-11 12:38:46 字数 1662 浏览 1 评论 0原文

我想使用分层KFOLD(SKF)生成多个测试数据拆分,然后使用SKLEARNN模型为这些测试数据拆分(以及所有数据)生成/组装预测。我愿意以编程方式进行此操作。

我已经使用下面的最小数据示例重新捕获了我的代码。简而言之,(在数据导入之后),我具有一个拟合模型并生成模型预测概率的函数。随后,我尝试将此功能传递给数据的每个SKF分开,以生成并整理我数据的每一行的预测概率。但是,此步骤失败并生成一个值(预期的布尔数组)。我的代码如下:

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold

    #load data, assemble dataframe
iris = datasets.load_iris()
X = pd.DataFrame(iris.data[51:150, :], columns = ["sepal_length", "sepal_width", 
"petal_length", "petal_width"])
y = pd.DataFrame(iris.target[51:150,], columns = ["target"])
df = pd.concat([X,y], axis = 1)

    #instantiate logistic regression
log = LogisticRegression()

    #modelling function
def train_model(train, test, fold):
    X = df.drop("target", axis = 1)
    y = df["target"]

    X_train = train[X]
    y_train = train[y]
    X_test = test[X]
    y_test = test[y]

        #generate probability of class 1 predictions from logistic regression model fit
    prob = log.fit(X_train, y_train).predict_proba(X_test)[:, 1]
    return (prob)
  #generate straified k-fold splits (2 used as example here)
skf = StratifiedKFold(n_splits = 2)

   #generate and collate all predictions (for each row in df)
fold = 1
outputs = []
for train_index, test_index in skf.split(df, y):
    train_df = df.loc[train_index,:]
    test_df = df.loc[test_index,:]
    output = train_model(train_df,test_df,fold) #generate model probabilities for X_test 
    in skf split
    outputs.append(output) #append all model probabilities 
    fold = fold + 1  

all_preds = pd.concat(outputs)

有人可以指导我进入包含行索引及其预测概率的解决方案吗?

I would like to generate multiple test data splits using stratified KFold (skf) and then generate/assemble predictions for each of these test data splits (and hence all of the data) using a sklearn model. I am at a wits end on how to do this programmatically.

I have recaptured my code using a minimal data example below. Briefly, (after data import), I have a function that does the model fit and generates model predicted probabilities. Subsequently, I attempt to pass this function to each skf split of my data so as to generate and subsequently collate predicted probabilities for each row of my data. However, this step fails and generates a valueerror (boolean array expected). My code follows below:

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold

    #load data, assemble dataframe
iris = datasets.load_iris()
X = pd.DataFrame(iris.data[51:150, :], columns = ["sepal_length", "sepal_width", 
"petal_length", "petal_width"])
y = pd.DataFrame(iris.target[51:150,], columns = ["target"])
df = pd.concat([X,y], axis = 1)

    #instantiate logistic regression
log = LogisticRegression()

    #modelling function
def train_model(train, test, fold):
    X = df.drop("target", axis = 1)
    y = df["target"]

    X_train = train[X]
    y_train = train[y]
    X_test = test[X]
    y_test = test[y]

        #generate probability of class 1 predictions from logistic regression model fit
    prob = log.fit(X_train, y_train).predict_proba(X_test)[:, 1]
    return (prob)
  #generate straified k-fold splits (2 used as example here)
skf = StratifiedKFold(n_splits = 2)

   #generate and collate all predictions (for each row in df)
fold = 1
outputs = []
for train_index, test_index in skf.split(df, y):
    train_df = df.loc[train_index,:]
    test_df = df.loc[test_index,:]
    output = train_model(train_df,test_df,fold) #generate model probabilities for X_test 
    in skf split
    outputs.append(output) #append all model probabilities 
    fold = fold + 1  

all_preds = pd.concat(outputs)

Can somebody please guide me to the solution that includes row index and its predicted probability?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文