Pytorch DataLoader 没有将数据集分成批次
我正在尝试使用以下代码在 DataLoader 中加载训练数据
class Dataset(Dataset):
def __init__(self, x, y):
self.x = x
self.y = y
def __getitem__(self, index):
x = torch.Tensor(self.x[index])
y = torch.Tensor(self.y[index])
return (x, y)
def __len__(self):
count = self.x.shape[0]
return count
X_train = np.reshape(X_train,(-1,1,X_train.shape[0],X_train.shape[1]))
y_train = np.reshape(y_train,(-1,1,y_train.shape[0],y_train.shape[1]))
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)
现在,当我检查 DataLoader 的长度时,我每次都会得到一个数据集。加载器不会将数据集分成批次。我在这里做错了什么?
I am trying to load training data in the DataLoader with following code
class Dataset(Dataset):
def __init__(self, x, y):
self.x = x
self.y = y
def __getitem__(self, index):
x = torch.Tensor(self.x[index])
y = torch.Tensor(self.y[index])
return (x, y)
def __len__(self):
count = self.x.shape[0]
return count
X_train = np.reshape(X_train,(-1,1,X_train.shape[0],X_train.shape[1]))
y_train = np.reshape(y_train,(-1,1,y_train.shape[0],y_train.shape[1]))
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)
Now, when I check the length of the DataLoader, I get one dataset everytime. The loader is not splitting the dataset into batches. What am I doing wrong here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
测试您的代码后,如果您删除Reshape步骤,似乎可以很好地工作。您正在介绍一个新的维度,因此X_Train的新形状是(1,某物,某物),但是您使用
self.x [index]
索引项目,因此您始终始终访问批处理维度。计算数据集的长度时,您会犯同样的错误:始终为1。解决方案:不要重塑。
After testing your code, it seems to work perfectly if you remove the reshape steps. You're introducing a new dimension, so the new shape of X_train is (1, something, something), but you're indexing your items using
self.x[index]
, so you're always accessing the batch dimension. You make the same mistake when calculating the length of your dataset: is always 1.Solution: do not reshape.