如何从本地磁盘加载NLTK停止字或单词

发布于 2025-02-09 23:26:52 字数 1518 浏览 0 评论 0原文

我需要从本地磁盘加载nltk“单词”数据。在笔记本中,我的代码看起来如下,

import nltk
nltk.data.path.append("/data") # Setting path here
nltk.corpus.words.words()

但是我会收到以下错误,

LookupError                               Traceback (most recent call last)
/anaconda3/lib/python3.8/site-packages/nltk/corpus/util.py in __load(self)
     83                 try:
---> 84                     root = nltk.data.find(f"{self.subdir}/{zip_name}")
     85                 except LookupError:

/anaconda3/lib/python3.8/site-packages/nltk/data.py in find(resource_name, paths)
    582     resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
--> 583     raise LookupError(resource_not_found)
    584 

LookupError: 
**********************************************************************
  Resource words not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('words')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/words.zip/words/

  Searched in:
    - '/home/my_user_name/nltk_data'
    - '/anaconda3/nltk_data'
    - '/anaconda3/share/nltk_data'
    - '/anaconda3/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/data'

我从这里使用了手动安装部分, https://www.nltk.org/data.html 但是,我想从笔记本上设置路径,而不是NLTK_DATA。

有帮助吗?提前致谢。

I would need to load nltk 'words' data from local disk. In the notebook my code looks like the following,

import nltk
nltk.data.path.append("/data") # Setting path here
nltk.corpus.words.words()

But I am getting error as follows,

LookupError                               Traceback (most recent call last)
/anaconda3/lib/python3.8/site-packages/nltk/corpus/util.py in __load(self)
     83                 try:
---> 84                     root = nltk.data.find(f"{self.subdir}/{zip_name}")
     85                 except LookupError:

/anaconda3/lib/python3.8/site-packages/nltk/data.py in find(resource_name, paths)
    582     resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
--> 583     raise LookupError(resource_not_found)
    584 

LookupError: 
**********************************************************************
  Resource words not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('words')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/words.zip/words/

  Searched in:
    - '/home/my_user_name/nltk_data'
    - '/anaconda3/nltk_data'
    - '/anaconda3/share/nltk_data'
    - '/anaconda3/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/data'

I used the manual installation part from here, https://www.nltk.org/data.html
But, instead of NLTK_DATA, I wanted to set the path from the notebook.

Any help? Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

七分※倦醒 2025-02-16 23:26:52

这完全令人沮丧的是,如此受欢迎的软件包NLTK发出了误导性的消息。

以下消息是错误的!

试图加载coldera/words.zip/words/

它不是试图从words.zip加载colder.zip用文件夹单词加载coldera。它试图从nltk_data/corpora/单词加载。

尝试加载nltk_data/coldora/wisd/

o,解决方案是按以下方式手动添加正确的路径,
nltk.data.path.append(“ ./ data/nltk_data”)

,然后将未拉链的文件放入其中,例如'Words'文件夹。然后访问单词

nltk.corpus.words.words()

This is completely frustrating that such a popular package nltk gives misleading messages.

The following message is just wrong!

Attempted to load corpora/words.zip/words/

It is not trying to load corpora from words.zip with folder words. It was attempting to load from nltk_data/corpora/words.

Attempted to load nltk_data/corpora/words/

so, solution is to add correct path manually as follows,
nltk.data.path.append("./data/nltk_data")

and then, put the unzipped file inside this, for example 'words' folder. And then access words as,

nltk.corpus.words.words()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文