如何将发布文件转换为pytorch张量

发布于 2025-02-03 18:57:22 字数 183 浏览 2 评论 0原文

是否有Python软件包将发布文件转换为Pytorch张量？通过发布文件，我是指带有以下格式的CSV文件： “ docid”，“ wordid”，count“

我也有一个词典。txt，将每个wordid与一个单词相关联。

最后，我的文本数据包括帖子文件和一个字典，我想将其与我使用Pytorch实施的深度学习模型一起使用。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

再可℃爱ぅ一点好了 2025-02-10 18:57:22

不，你必须自己做。您只需将每个Elemnt转换为pytorch张量，也可以使用Pytorch数据集API。

import csv
import torch
from torch.utils.data import Dataset
from typing import List, NamedTuple

CsvRowItem = NamedTuple("CsvRowItem", [
    ("docId", int),
    ("wordId", int),
    ("count", int)
])

data: List[CsvRowItem] = []
with open("data.csv", mode='r') as file:
    reader = csv.reader(file)
    next(reader) # Skip the header
    for row in reader:
        data.append(
            CsvRowItem(docId=int(row[0]),
                       wordId=int(row[1]),
                       count=int(row[2]))
        )



class YourDataset(Dataset):

    def __init__(self, data: List[CsvRowItem]):
        self.data = data
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index: int) -> torch.IntTensor:
        item = self.data[index]
        return torch.IntTensor([item.docId, item.wordId, item.count])


dataset = YourDataset(data=data)
print(f"Length of data: {len(dataset):,}")
print(dataset[0])

No, you have to do it youself. You can simply convert each elemnt into a pytorch tensor or use the pytorch dataset api like this.

import csv
import torch
from torch.utils.data import Dataset
from typing import List, NamedTuple

CsvRowItem = NamedTuple("CsvRowItem", [
    ("docId", int),
    ("wordId", int),
    ("count", int)
])

data: List[CsvRowItem] = []
with open("data.csv", mode='r') as file:
    reader = csv.reader(file)
    next(reader) # Skip the header
    for row in reader:
        data.append(
            CsvRowItem(docId=int(row[0]),
                       wordId=int(row[1]),
                       count=int(row[2]))
        )



class YourDataset(Dataset):

    def __init__(self, data: List[CsvRowItem]):
        self.data = data
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index: int) -> torch.IntTensor:
        item = self.data[index]
        return torch.IntTensor([item.docId, item.wordId, item.count])


dataset = YourDataset(data=data)
print(f"Length of data: {len(dataset):,}")
print(dataset[0])

回复收藏 0 原文

~没有更多了~