通过参数通过函数将pandas dataframe转换为语料库文件时的错误
我想准备使用NLTK的PANDAS DataFrame中的文本数据。为此,我将代码用于将PANDAS DataFrame的每一行转换为语料库的函数。
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
date=r['Date']
tweet=r['Text']
place=r['Place']
fname=str(date)+'_'+'.txt'
corpusfile=open(corpusfolder+'/'+fname,'a')
corpusfile.write(str(tweet) +" " +str(date))
corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)
问题是我一直收到这样的信息:
NameError: name 'myfolder' is not defined
即使我在jupyter笔记本的相同路径目录中有一个名为“ myFolder”的文件夹,我的代码就在吗?
更新:
我现在可以看到,问题简直就是我需要将文件夹名称作为字符串传递。现在我已经做到了并修改了我的代码。我现在遇到的问题是,使用该函数创建的文本文件的内容没有写入语料库,而创建的变量类型是“非电视”。
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
id=r['Date']
tweet=r['Text']
#place=r['Place']
#fname=str(date)+'_'+'.txt'
fname='tweets'+'.txt'
corpusfile=open(corpusfolder+'/'+fname,'a')
corpusfile.write(str(tweet) +" ")
corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType
I want to prepare my text data that is in a pandas dataframe for sentiment analysis with nltk. For that, I'm using code for a function that converts each row of a pandas dataframe into a corpus.
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
date=r['Date']
tweet=r['Text']
place=r['Place']
fname=str(date)+'_'+'.txt'
corpusfile=open(corpusfolder+'/'+fname,'a')
corpusfile.write(str(tweet) +" " +str(date))
corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)
The problem is I keep getting the message that
NameError: name 'myfolder' is not defined
Even though I have a folder called 'myfolder' in the same path directory of jupyter notebook that my code is in?
UPDATE:
I can see now that the issue was simply that I needed to pass the folder name as a string. Now that I've done that and amended my code. The problem I have now is that the contents of the text file created with the function are not being written into a corpus and the type of variable being created is a 'NoneType'.
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
id=r['Date']
tweet=r['Text']
#place=r['Place']
#fname=str(date)+'_'+'.txt'
fname='tweets'+'.txt'
corpusfile=open(corpusfolder+'/'+fname,'a')
corpusfile.write(str(tweet) +" ")
corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题
您将
myFolder
作为您在代码中未定义的函数的变量,因此提高了名称。解决方案
只需将其替换为
'myFolder'
[将其传递为字符串]。Problem
You are passing
myfolder
as a variable to your function which you have not defined in your code and hence it raises a NameError.Solution
Just replace it with
'myfolder'
[pass it as a string].