从 Twitter XML 页面提取数据的列表问题
通过我的函数,我可以从 Twitter xml 搜索页面中提取我正在作为项目构建的朋友查找应用程序的用户名。但问题是,当我抓取用户名并将它们输入到列表中时,会发生一些奇怪的事情。我没有将每个用户名作为列表中的单独元素,而是将每个用户名作为其自己的列表。
所以我得到了 20 个左右的列表。这是我的代码产生的示例 list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]
所以你会看到每个用户名都是它自己的列表。我没有使用一个包含三个值的列表,而是使用三个列表,每个列表中各有一个值。这对于迭代来说绝对是一场噩梦。我怎样才能拥有一个包含三个元素的列表?
代码在这里:
def get_names(search_term = raw_input("What term do you want to search for?")):
search_page = "http://search.twitter.com/search.atom?q="
search_page += search_term
data = []
doc = urllib.urlopen(search_page).read()
soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")
for uri in soup.findAll('uri'):
data = []
uri = str(uri.extract())
data.append(uri[5:-6]
print data
With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a project. The problem though is that when I grab the usernames and input them into a list something strange happens. Instead of having each username as a separate element within a list I have each username being its own list.
So I instead get 20 or so lists. Here is an example of what my code produces
list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]
So you see every single username is its own list. Instead of having one list with three values I have three lists with one value each in them. This is an absolute nightmare to iterate through. How can I make it so I have one list with three elements?
Code is here:
def get_names(search_term = raw_input("What term do you want to search for?")):
search_page = "http://search.twitter.com/search.atom?q="
search_page += search_term
data = []
doc = urllib.urlopen(search_page).read()
soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")
for uri in soup.findAll('uri'):
data = []
uri = str(uri.extract())
data.append(uri[5:-6]
print data
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在为每个 URI 创建一个名为 data 的新列表。如果将
data = []
行移出for uri in soup.findAll('uri'):
循环,则最终应该得到一个列表,而不是一个列表列表的列表。此外,您还遇到一些其他问题。
倒数第二行存在语法错误:行尾缺少右括号。
你有重复的行。尝试删除第一行
data = []
行以及data = soup.findAll('url')
行,因为您只是再次执行 findAll for 循环。此外,您不应该将
raw_input
放在函数签名中,因为这意味着它在定义函数时被调用,而不是在调用函数时被调用。试试这个:
编辑:您也不需要
''.join(doc)
,read()
返回单个字符串,而不是序列;data
可以通过字符串理解来组装。You're making a new list, called data, for each URI. If you move the
data = []
line out of thefor uri in soup.findAll('uri'):
loop, you should end up with one list instead of a list of lists.In addition, you've got some other problems.
There is a syntax error on your next to last line: you're missing a close-parenthesis at the end of the line.
You've got duplicate lines. Try removing the first
data = []
line, as well as thedata = soup.findAll('url')
line, as you're just doing findAll again for the for loop.In addition, you shouldn't put
raw_input
in the function signature, because that means it gets call when you define the function, not when you call the function.Try this:
Edit: You also don't need to
''.join(doc)
,read()
returns a single string, not a sequence;data
can be assembled with a string comprehension.问题是你的数据分配有些混乱。
我建议将该代码更改为:(
未经测试,因为我不知道 BeautifulStoneSoup 指的是什么)
HTH
Pacific
The problem is you're sort of all over the place in your assignments to data;
I'd suggest changing that code to:
(untested since I don't know what BeautifulStoneSoup is refering to)
HTH
Pacific