通过火车测试拆分实施K折

发布于 2025-02-12 21:36:19 字数 1317 浏览 1 评论 0原文

我试图将kfold放在我的代码上,因为过度拟合是一个问题。 以前,我已经将数据分为火车测试。 但是,随着我的数据已经拆分,我会感到困惑的位置以及如何应用K折。

x_norm = preprocessing.normalize(x, axis=0)
x=x_norm

x_trainval, x_test, y_trainval, y_test = train_test_split(x, y, test_size=0.2, random_state=0, stratify = df["label"])
#y_trainval: labels from 80% 

# Split train into train-val
x_train, x_val, y_train, y_val = train_test_split(x_trainval, y_trainval, test_size=0.1, random_state=0)

class classifierdataset(Dataset):
    def __init__(self,x_data,y_data):
        self.x_data = x_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]
    
    def __len__(self):
        return len(self.x_data)
    
train_dataset = classifierdataset(torch.from_numpy(x_train).float(), torch.from_numpy(y_train).long())

val_dataset = classifierdataset(torch.from_numpy(x_val).float(), torch.from_numpy(y_val).long()) 

test_dataset = classifierdataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test).long())   

EPOCHS = 10
BATCH_SIZE = 16
LEARNING_RATE = 0.0007
#0.0009, 0.0007
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE)
val_loader = DataLoader(dataset = val_dataset, batch_size = 1)
test_loader = DataLoader(dataset = test_dataset , batch_size = 1)

I am trying to put kfold to my code as overfitting is an issue.
Previously i have split my data into train test .
But i am getting confused where and how to apply k-fold as my data is already split.

x_norm = preprocessing.normalize(x, axis=0)
x=x_norm

x_trainval, x_test, y_trainval, y_test = train_test_split(x, y, test_size=0.2, random_state=0, stratify = df["label"])
#y_trainval: labels from 80% 

# Split train into train-val
x_train, x_val, y_train, y_val = train_test_split(x_trainval, y_trainval, test_size=0.1, random_state=0)

class classifierdataset(Dataset):
    def __init__(self,x_data,y_data):
        self.x_data = x_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]
    
    def __len__(self):
        return len(self.x_data)
    
train_dataset = classifierdataset(torch.from_numpy(x_train).float(), torch.from_numpy(y_train).long())

val_dataset = classifierdataset(torch.from_numpy(x_val).float(), torch.from_numpy(y_val).long()) 

test_dataset = classifierdataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test).long())   

EPOCHS = 10
BATCH_SIZE = 16
LEARNING_RATE = 0.0007
#0.0009, 0.0007
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE)
val_loader = DataLoader(dataset = val_dataset, batch_size = 1)
test_loader = DataLoader(dataset = test_dataset , batch_size = 1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文