TorchText Vocab TypeError：Vocab.init() 得到了意外的关键字参数“min_freq”；

发布于 2025-01-17 13:33:44 字数 1321 浏览 3 评论 0原文

我正在研究 CNN 情感分析机器学习模型，该模型使用 Torchtext 库提供的 IMDb 数据集。在我的一行代码中

vocab = Vocab(counter, min_freq = 1,specials=('\', '\', '\;', '\'))

尽管我确信 min_freq 参数是可接受的参数之一，但我收到了 TypeError 功能。我还收到 UserWarning Pickle 不支持 Lambda 函数，请使用常规 python 函数或 functools 部分代替。完整代码

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import Vocab
tokenizer = get_tokenizer('basic_english')  
train_iter = IMDB(split='train')
test_iter = IMDB(split='test')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, min_freq = 1, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

源码链接走向数据科学 github Legacy to new

我尝试删除 min_freq 参数并使用默认函数，如下

vocab = Vocab(counter,specials=('\', '\', '\', '\'))

然而，我最终得到了相同的类型错误，但对于特殊参数而不是 min_freq。

任何帮助将不胜感激，

谢谢。

原文

I am working on a CNN Sentiment analysis machine learning model which uses the IMDb dataset provided by the Torchtext library.
On one of my lines of code

vocab = Vocab(counter, min_freq = 1, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

I am getting a TypeError for the min_freq argument even though I am certain that it is one of the accepted arguments for the function. I am also getting UserWarning Lambda function is not supported for pickle, please use regular python function or functools partial instead. Full code

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import Vocab
tokenizer = get_tokenizer('basic_english')  
train_iter = IMDB(split='train')
test_iter = IMDB(split='test')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, min_freq = 1, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

Source Links
towardsdatascience
github Legacy to new

I have tried removing the min_freq argument and use the functions default as follows

vocab = Vocab(counter, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

however I end up getting the same type error but for the specials argument rather than min_freq.

Any help will be much appreciated

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千柳 2025-01-24 13:33:44

AS https://github.com/pytorch/pytorch/text/text/issues/1445 应该将“词汇”更改为“词汇”。我认为他们错过了旧版笔记本。

正确的代码：

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import vocab
tokenizer = get_tokenizer('basic_english')  
train_iter = IMDB(split='train')
test_iter = IMDB(split='test')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = vocab(counter, min_freq = 1, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

我的环境：

Python 3.9.12
TorchText 0.12.0
Pytorch 1.11.0

As https://github.com/pytorch/text/issues/1445 mentioned, you should change "Vocab" to "vocab". I think they miss-type the legacy-to-new notebook.

correct code:

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import vocab
tokenizer = get_tokenizer('basic_english')  
train_iter = IMDB(split='train')
test_iter = IMDB(split='test')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = vocab(counter, min_freq = 1, specials=('\<unk\>', '\<BOS\>', '\<EOS\>', '\<PAD\>'))

my environment:

python 3.9.12
torchtext 0.12.0
pytorch 1.11.0

回复收藏 0 原文

傻比既视感 2025-01-24 13:33:44

您可以尝试使用torchtext.legacy.vocab而不是可以解决问题的Vocab。这对我有用：

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.legacy.vocab import vocab

You can try torchtext.legacy.vocab instead of torchtext.vocab which might solve the issue. This worked for me:

from torchtext.datasets import IMDB
from collections import Counter
from torchtext.data.utils import get_tokenizer
from torchtext.legacy.vocab import vocab

回复收藏 0 原文

她比我温柔 2025-01-24 13:33:44

抱歉，这对我不起作用。 :(
词汇是对象的正确名称，词汇不是。

我发现的简单解决方案是：从实验词汇中除去了“特殊”元组，并且不再使用！就这样。

https://github.com/pytorch/pytorch/text/text/sissues/890

我的环境：我的环境：

python 3.8.16
torchtext 0.15.1
pytorch 2.0.0

Sorry, it doesn't work for me. :(
Vocab is correct name of object and vocab is not.

Simply solution I found is: that "specials" tuple was removed from experimental Vocab and in no more in use! That's all.

https://github.com/pytorch/text/issues/890

my environment:

python 3.8.16
torchtext 0.15.1
pytorch 2.0.0

回复收藏 0 原文

~没有更多了~

关于作者

倾城泪

暂无简介

文章

534 人气

关注发私信

友情链接

文江博客

TorchText Vocab TypeError：Vocab.init() 得到了意外的关键字参数“min_freq”；

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

TorchText Vocab TypeError：Vocab.__init__() 得到了意外的关键字参数“min_freq”；

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

TorchText Vocab TypeError：Vocab.init() 得到了意外的关键字参数“min_freq”；

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。