Pytorch DataLoader 没有将数据集分成批次

发布于 2025-01-17 15:25:04 字数 729 浏览 1 评论 0原文

我正在尝试使用以下代码在 DataLoader 中加载训练数据

class Dataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __getitem__(self, index):
        x = torch.Tensor(self.x[index])
        y = torch.Tensor(self.y[index])
        return (x, y)

    def __len__(self):
        count = self.x.shape[0]
        return count
    
X_train = np.reshape(X_train,(-1,1,X_train.shape[0],X_train.shape[1]))
y_train = np.reshape(y_train,(-1,1,y_train.shape[0],y_train.shape[1]))
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

现在，当我检查 DataLoader 的长度时，我每次都会得到一个数据集。加载器不会将数据集分成批次。我在这里做错了什么？

原文

I am trying to load training data in the DataLoader with following code

class Dataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __getitem__(self, index):
        x = torch.Tensor(self.x[index])
        y = torch.Tensor(self.y[index])
        return (x, y)

    def __len__(self):
        count = self.x.shape[0]
        return count
    
X_train = np.reshape(X_train,(-1,1,X_train.shape[0],X_train.shape[1]))
y_train = np.reshape(y_train,(-1,1,y_train.shape[0],y_train.shape[1]))
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

Now, when I check the length of the DataLoader, I get one dataset everytime. The loader is not splitting the dataset into batches. What am I doing wrong here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

×纯※雪 2025-01-24 15:25:04

测试您的代码后，如果您删除Reshape步骤，似乎可以很好地工作。您正在介绍一个新的维度，因此X_Train的新形状是（1，某物，某物），但是您使用self.x [index]索引项目，因此您始终始终访问批处理维度。计算数据集的长度时，您会犯同样的错误：始终为1。

解决方案：不要重塑。

X_train = np.random.rand(12_000, 1280)
y_train = np.random.rand(12_000, 1)
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

for x, y in train_loader:
    print(x.shape)
    print(y.shape)
    break

After testing your code, it seems to work perfectly if you remove the reshape steps. You're introducing a new dimension, so the new shape of X_train is (1, something, something), but you're indexing your items using self.x[index], so you're always accessing the batch dimension. You make the same mistake when calculating the length of your dataset: is always 1.

Solution: do not reshape.

X_train = np.random.rand(12_000, 1280)
y_train = np.random.rand(12_000, 1)
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

for x, y in train_loader:
    print(x.shape)
    print(y.shape)
    break

回复收藏 0 原文

~没有更多了~