如何将发布文件转换为pytorch张量

发布于 2025-02-03 18:57:22 字数 183 浏览 2 评论 0原文

是否有Python软件包将发布文件转换为Pytorch张量? 通过发布文件,我是指带有以下格式的CSV文件: “ docid”,“ wordid”,count“

我也有一个词典。txt,将每个wordid与一个单词相关联。

最后,我的文本数据包括帖子文件和一个字典,我想将其与我使用Pytorch实施的深度学习模型一起使用。

Is there a python package that transforms a postings file to a pytorch tensor?
By a posting file I mean a csv file with the following format:
"docID" ,"wordID" ,"count"

I also have a dictionary.txt which associates each wordID to a word.

At the end, my text data consists of postings file and a dictionary and I want to use it with a deep learning model that I have implemented with Pytorch.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

再可℃爱ぅ一点好了 2025-02-10 18:57:22

不,你必须自己做。您只需将每个Elemnt转换为pytorch张量,也可以使用Pytorch数据集API。

import csv
import torch
from torch.utils.data import Dataset
from typing import List, NamedTuple

CsvRowItem = NamedTuple("CsvRowItem", [
    ("docId", int),
    ("wordId", int),
    ("count", int)
])

data: List[CsvRowItem] = []
with open("data.csv", mode='r') as file:
    reader = csv.reader(file)
    next(reader) # Skip the header
    for row in reader:
        data.append(
            CsvRowItem(docId=int(row[0]),
                       wordId=int(row[1]),
                       count=int(row[2]))
        )



class YourDataset(Dataset):

    def __init__(self, data: List[CsvRowItem]):
        self.data = data
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index: int) -> torch.IntTensor:
        item = self.data[index]
        return torch.IntTensor([item.docId, item.wordId, item.count])


dataset = YourDataset(data=data)
print(f"Length of data: {len(dataset):,}")
print(dataset[0])

No, you have to do it youself. You can simply convert each elemnt into a pytorch tensor or use the pytorch dataset api like this.

import csv
import torch
from torch.utils.data import Dataset
from typing import List, NamedTuple

CsvRowItem = NamedTuple("CsvRowItem", [
    ("docId", int),
    ("wordId", int),
    ("count", int)
])

data: List[CsvRowItem] = []
with open("data.csv", mode='r') as file:
    reader = csv.reader(file)
    next(reader) # Skip the header
    for row in reader:
        data.append(
            CsvRowItem(docId=int(row[0]),
                       wordId=int(row[1]),
                       count=int(row[2]))
        )



class YourDataset(Dataset):

    def __init__(self, data: List[CsvRowItem]):
        self.data = data
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index: int) -> torch.IntTensor:
        item = self.data[index]
        return torch.IntTensor([item.docId, item.wordId, item.count])


dataset = YourDataset(data=data)
print(f"Length of data: {len(dataset):,}")
print(dataset[0])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文