通过参数通过函数将pandas dataframe转换为语料库文件时的错误

发布于 2025-02-11 19:13:21 字数 1343 浏览 2 评论 0原文

我想准备使用NLTK的PANDAS DataFrame中的文本数据。为此,我将代码用于将PANDAS DataFrame的每一行转换为语料库的函数。

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        date=r['Date']
        tweet=r['Text']
        place=r['Place']
        fname=str(date)+'_'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" " +str(date))
        corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)

问题是我一直收到这样的信息:

NameError: name 'myfolder' is not defined

即使我在jupyter笔记本的相同路径目录中有一个名为“ myFolder”的文件夹,我的代码就在吗?

更新:

我现在可以看到,问题简直就是我需要将文件夹名称作为字符串传递。现在我已经做到了并修改了我的代码。我现在遇到的问题是,使用该函数创建的文本文件的内容没有写入语料库,而创建的变量类型是“非电视”。

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        id=r['Date']
        tweet=r['Text']
        #place=r['Place']
        #fname=str(date)+'_'+'.txt'
        fname='tweets'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" ")
        corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType

I want to prepare my text data that is in a pandas dataframe for sentiment analysis with nltk. For that, I'm using code for a function that converts each row of a pandas dataframe into a corpus.

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        date=r['Date']
        tweet=r['Text']
        place=r['Place']
        fname=str(date)+'_'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" " +str(date))
        corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)

The problem is I keep getting the message that

NameError: name 'myfolder' is not defined

Even though I have a folder called 'myfolder' in the same path directory of jupyter notebook that my code is in?

UPDATE:

I can see now that the issue was simply that I needed to pass the folder name as a string. Now that I've done that and amended my code. The problem I have now is that the contents of the text file created with the function are not being written into a corpus and the type of variable being created is a 'NoneType'.

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        id=r['Date']
        tweet=r['Text']
        #place=r['Place']
        #fname=str(date)+'_'+'.txt'
        fname='tweets'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" ")
        corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾城花音 2025-02-18 19:13:21

问题

您将myFolder作为您在代码中未定义的函数的变量,因此提高了名称。

解决方案

只需将其替换为'myFolder' [将其传递为字符串]。

CreateCorpusFromDataFrame('myfolder',mydf)

Problem

You are passing myfolder as a variable to your function which you have not defined in your code and hence it raises a NameError.

Solution

Just replace it with 'myfolder' [pass it as a string].

CreateCorpusFromDataFrame('myfolder',mydf)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文