Spacy Model EN_CORE_WEB_LG的问题:如何防止每次运行代码下载软件包
我正在使用Spacy及其模型en_core_web_lg在Python中执行摘要。代码运行完美,根本没有错误。除此之外,我正在尝试找到一种方法来确保EN_CORE_WEB_LG如果已经有了它,则不会在环境中下载。我已经搜索了很多东西,以找到一个完美的解决方案,我将在下面列出,但没有一个人对我想实现的目标感到震惊。 此代码将被包装,并将由多个人使用,我想确保如果它们每次运行代码,则EN_CORE_WEB_LG如果已经存在,则不会下载。以下是我的代码和我尝试过的解决方案的摘录:
#Importing necessary Libraries
from heapq import nlargest
from string import punctuation
import nltk
import spacy
from spacy.cli.download import download
from spacy.lang.en.stop_words import STOP_WORDS
nltk.download('punkt')
download(model="en_core_web_lg")
nlp_g = spacy.load('en_core_web_lg') #downloads everytime the code is run even if the model is present in the environment
def spacy_summarize(text):
"""
Returns the summary for an input string text
Parameters:
:param text: Input String
:type text: str
Returns:
:return: The summary for the input text
:rtype: String
"""
nlp = nlp_g
doc= nlp(text)
word_frequencies={}
for word in doc:
if word.text.lower() not in [list(STOP_WORDS), punctuation]:
if word.text not in word_frequencies:
word_frequencies[word.text] = 1
else:
word_frequencies[word.text] += 1
max_frequency=max(word_frequencies.values())
for word in word_frequencies:
word_frequencies.copy()[word]=word_frequencies[word]/max_frequency
sentence_tokens= [sent for sent in doc.sents]
sentence_scores = {}
spacy_frequencies(word_frequencies, sentence_tokens, sentence_scores)
select_length=max(1,int(len(sentence_tokens)*0.05))
summary=nlargest(select_length, sentence_scores,key=sentence_scores.get)
final_summary=[word.text for word in summary]
summary=''.join(final_summary)
return summary
def spacy_frequencies(word_frequencies, sentence_tokens, sentence_scores):
"""
Child function for spacy function for calculating sentence scores
Parameters:
:param: word frequeny, sentence token and score which
is provided through the parent function
"""
for sent in sentence_tokens:
for word in sent:
if word.text.lower() in word_frequencies:
if sent not in sentence_scores:
sentence_scores[sent]=word_frequencies[word.text.lower()]
else:
sentence_scores[sent]+=word_frequencies[word.text.lower()]
尝试的事情:
import sys
import subprocess
import pkg_resources
required = {'en_core_web_lg'}
installed = {pkg.key for pkg in pkg_resources.working_set}
missing = required - installed
if missing:
python = sys.executable
subprocess.check_call([python, '-m', 'spacy', 'download', *missing], stdout=subprocess.DEVNULL)
try:
nlp_lg = spacy.load("en_core_web_lg")
except ModuleNotFoundError:
download(model="en_core_web_lg")
nlp_lg = spacy.load("en_core_web_lg")
两种解决方案都没有给出令人满意的结果,并且再次下载了包裹,如果有人可以帮助我,我将不胜感激吗? 太感谢了!
I am using spacy and its model en_core_web_lg, to perform summarisation in python. The code is running perfectly and there is no error at all. Except that, I am trying to find a way of making sure that the en_core_web_lg doesn't keep downloading in an environment if it already has it. I have googled a lot to find a perfect solution for this which I will list below but none has gelled with what I am trying to achieve.
This code will be packaged and will be used by multiple people and I want to make sure that if they run the code everytime, the en_core_web_lg doesn't download if it already exists. Below is the spacy excerpt of my code and the solutions I tried:
#Importing necessary Libraries
from heapq import nlargest
from string import punctuation
import nltk
import spacy
from spacy.cli.download import download
from spacy.lang.en.stop_words import STOP_WORDS
nltk.download('punkt')
download(model="en_core_web_lg")
nlp_g = spacy.load('en_core_web_lg') #downloads everytime the code is run even if the model is present in the environment
def spacy_summarize(text):
"""
Returns the summary for an input string text
Parameters:
:param text: Input String
:type text: str
Returns:
:return: The summary for the input text
:rtype: String
"""
nlp = nlp_g
doc= nlp(text)
word_frequencies={}
for word in doc:
if word.text.lower() not in [list(STOP_WORDS), punctuation]:
if word.text not in word_frequencies:
word_frequencies[word.text] = 1
else:
word_frequencies[word.text] += 1
max_frequency=max(word_frequencies.values())
for word in word_frequencies:
word_frequencies.copy()[word]=word_frequencies[word]/max_frequency
sentence_tokens= [sent for sent in doc.sents]
sentence_scores = {}
spacy_frequencies(word_frequencies, sentence_tokens, sentence_scores)
select_length=max(1,int(len(sentence_tokens)*0.05))
summary=nlargest(select_length, sentence_scores,key=sentence_scores.get)
final_summary=[word.text for word in summary]
summary=''.join(final_summary)
return summary
def spacy_frequencies(word_frequencies, sentence_tokens, sentence_scores):
"""
Child function for spacy function for calculating sentence scores
Parameters:
:param: word frequeny, sentence token and score which
is provided through the parent function
"""
for sent in sentence_tokens:
for word in sent:
if word.text.lower() in word_frequencies:
if sent not in sentence_scores:
sentence_scores[sent]=word_frequencies[word.text.lower()]
else:
sentence_scores[sent]+=word_frequencies[word.text.lower()]
Things Tried:
import sys
import subprocess
import pkg_resources
required = {'en_core_web_lg'}
installed = {pkg.key for pkg in pkg_resources.working_set}
missing = required - installed
if missing:
python = sys.executable
subprocess.check_call([python, '-m', 'spacy', 'download', *missing], stdout=subprocess.DEVNULL)
try:
nlp_lg = spacy.load("en_core_web_lg")
except ModuleNotFoundError:
download(model="en_core_web_lg")
nlp_lg = spacy.load("en_core_web_lg")
Both solutions didn't give a satisfactory result and the package was downloaded again and I would appreciate if someone could help me with this?
Thank you so much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Spacy根本不会自动下载模型,因此这必须是您的代码的错误,该错误检查是否已经安装了模型。
查看此代码:
问题是,如果未安装模型,则是
oserror
,而不是modulenotefounderror
。首先,您需要解决这个问题。这种方法似乎应该可以工作,除了在安装它们的同一过程中加载模型不能可靠地工作 - 在Python运行时未更新已安装的软件包的列表。因此,即使解决了上述问题,它也可能无法按预期工作。
我建议要么建议:
pip list的输出
以查看是否安装了模型,如果没有安装spaCy doesn't automatically download models at all, so this must be a bug with your code that checks if the model is already installed.
Looking at this code:
The issue is that if the model is not installed this is an
OSError
, not aModuleNoteFoundError
. First you need to fix that.This approach seems like it should work, except loading models in the same process you installed them in doesn't work very reliably - the list of installed packages is not updated while Python is running. So even after fixing the above issue, it may not work as intended.
I would recommend either:
pip list
to see if the model is installed, and install it if not