获取“AttributeError:“NoneType”对象没有属性“count”扩展我的数据集时
我正在用 python 分析 Twitter 数据集,并尝试找到每一条引用。该代码应该为我提供一个 .csv 文件,其中包含所有推文及其引用的列表。 我在 Github 上找到了一段代码,其中有人尝试了同样的操作,但数据来自网站。我调整了代码的数据集。 这些推文都位于一个 .xml 文件中,如下所示:
<articles>
<article>
<paragraph>Tweet text is here.</paragraph>
<paragraph>Tweet text is here.</paragraph>
</article>
</articles>
我的数据集有 1.000.000 条推文。当分析 50,000 条推文的样本时,一切都按预期进行。 分析完整数据集时,我收到此消息:
Traceback (most recent call last):
File "C:/xxx.py", line 16, in <module>
count = text.count("\'")
AttributeError: 'NoneType' object has no attribute 'count'
为什么在分析整个数据集时会收到此消息,但在分析样本时却不会收到此消息?
这是我的代码:
import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np
tree = ET.parse('tweets.xml')
articles = tree.getroot()
paragraphs_with_quotes = []
paragraphs_with_double_quotes = []
quotes = []
extracted_paragraphs = []
for article in articles:
for paragraph in article.findall('paragraph'):
text = paragraph.text
count = text.count("\'")
indexes = []
if count > 1:
paragraphs_with_quotes.append(text)
index = text.index("\'")
while count > 0:
if text[index - 1] == " " or index == len(text) - 1 or text[index + 1] in " .,":
indexes.append(index)
if count > 1:
index = text.index("\'", index + 1)
count -= 1
for i in range(0, len(indexes), 2):
start = indexes[i]
end = indexes[min(len(indexes) - 1, i + 1)]
print(text)
quotes.append(text[indexes[i]:indexes[min(len(indexes) - 1, i + 1)] + 1])
extracted_paragraphs.append(text)
print("Quote:" + quotes[len(quotes) - 1])
print()
d = {'Paragraph:': extracted_paragraphs, 'Quote:': quotes}
quote_data = pd.DataFrame(d)
quote_data.to_csv('quote_data.csv')
for i in range(1):
print()
print(len(paragraphs_with_quotes))
谢谢!
I'm analyzing a twitter dataset in python and try to find every quote. The code is supposed to give me a .csv file with a list of all tweets and their quotes.
I found a code on Github where someone tried the same thing but with data from a website. I adjusted my dataset for the code.
The tweets are all in an .xml-file like this:
<articles>
<article>
<paragraph>Tweet text is here.</paragraph>
<paragraph>Tweet text is here.</paragraph>
</article>
</articles>
My dataset has 1.000.000 tweets. When analyzing a sample size of 50.000 tweets everything works as supposed.
When analyzing the full dataset I get this message:
Traceback (most recent call last):
File "C:/xxx.py", line 16, in <module>
count = text.count("\'")
AttributeError: 'NoneType' object has no attribute 'count'
Why do I get this when I analyze the whole dataset but not when I analyze the sample?
Here's my code:
import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np
tree = ET.parse('tweets.xml')
articles = tree.getroot()
paragraphs_with_quotes = []
paragraphs_with_double_quotes = []
quotes = []
extracted_paragraphs = []
for article in articles:
for paragraph in article.findall('paragraph'):
text = paragraph.text
count = text.count("\'")
indexes = []
if count > 1:
paragraphs_with_quotes.append(text)
index = text.index("\'")
while count > 0:
if text[index - 1] == " " or index == len(text) - 1 or text[index + 1] in " .,":
indexes.append(index)
if count > 1:
index = text.index("\'", index + 1)
count -= 1
for i in range(0, len(indexes), 2):
start = indexes[i]
end = indexes[min(len(indexes) - 1, i + 1)]
print(text)
quotes.append(text[indexes[i]:indexes[min(len(indexes) - 1, i + 1)] + 1])
extracted_paragraphs.append(text)
print("Quote:" + quotes[len(quotes) - 1])
print()
d = {'Paragraph:': extracted_paragraphs, 'Quote:': quotes}
quote_data = pd.DataFrame(d)
quote_data.to_csv('quote_data.csv')
for i in range(1):
print()
print(len(paragraphs_with_quotes))
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的猜测是您的
文章
没有段落
。您需要能够处理
text
为NoneType
My guess would be that you have an
article
that doesn't have aparagraph
.You need to be able to handle when
text
isNoneType