从 Twitter XML 页面提取数据的列表问题

发布于 2024-11-25 02:58:50 字数 838 浏览 1 评论 0原文

通过我的函数,我可以从 Twitter xml 搜索页面中提取我正在作为项目构建的朋友查找应用程序的用户名。但问题是,当我抓取用户名并将它们输入到列表中时,会发生一些奇怪的事情。我没有将每个用户名作为列表中的单独元素,而是将每个用户名作为其自己的列表。

所以我得到了 20 个左右的列表。这是我的代码产生的示例 list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]

所以你会看到每个用户名都是它自己的列表。我没有使用一个包含三个值的列表,而是使用三个列表,每个列表中各有一个值。这对于迭代来说绝对是一场噩梦。我怎样才能拥有一个包含三个元素的列表?

代码在这里:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    data = soup.findAll("uri")
    for uri in soup.findAll('uri'):
        data = []
        uri = str(uri.extract())
        data.append(uri[5:-6] 
        print data

With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a project. The problem though is that when I grab the usernames and input them into a list something strange happens. Instead of having each username as a separate element within a list I have each username being its own list.

So I instead get 20 or so lists. Here is an example of what my code produces
list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]

So you see every single username is its own list. Instead of having one list with three values I have three lists with one value each in them. This is an absolute nightmare to iterate through. How can I make it so I have one list with three elements?

Code is here:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    data = soup.findAll("uri")
    for uri in soup.findAll('uri'):
        data = []
        uri = str(uri.extract())
        data.append(uri[5:-6] 
        print data

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唠甜嗑 2024-12-02 02:58:50

您正在为每个 URI 创建一个名为 data 的新列表。如果将 data = [] 行移出 for uri in soup.findAll('uri'): 循环,则最终应该得到一个列表,而不是一个列表列表的列表。

此外,您还遇到一些其他问题。
倒数第二行存在语法错误:行尾缺少右括号。
你有重复的行。尝试删除第一行 data = [] 行以及 data = soup.findAll('url') 行,因为您只是再次执行 findAll for 循环。
此外,您不应该将 raw_input 放在函数签名中,因为这意味着它在定义函数时被调用,而不是在调用函数时被调用。

试试这个:

def get_names():
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += raw_input("What term do you want to search for?")
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(doc)
    doc.close()
    data = [str(uri.extract())[5:-6] for uri in soup.findall('uri')]
    return data
names = get_names()
print(names)

编辑:您也不需要 ''.join(doc)read() 返回单个字符串,而不是序列; data 可以通过字符串理解来组装。

You're making a new list, called data, for each URI. If you move the data = [] line out of the for uri in soup.findAll('uri'): loop, you should end up with one list instead of a list of lists.

In addition, you've got some other problems.
There is a syntax error on your next to last line: you're missing a close-parenthesis at the end of the line.
You've got duplicate lines. Try removing the first data = [] line, as well as the data = soup.findAll('url') line, as you're just doing findAll again for the for loop.
In addition, you shouldn't put raw_input in the function signature, because that means it gets call when you define the function, not when you call the function.

Try this:

def get_names():
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += raw_input("What term do you want to search for?")
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(doc)
    doc.close()
    data = [str(uri.extract())[5:-6] for uri in soup.findall('uri')]
    return data
names = get_names()
print(names)

Edit: You also don't need to ''.join(doc), read() returns a single string, not a sequence; data can be assembled with a string comprehension.

紫﹏色ふ单纯 2024-12-02 02:58:50

问题是你的数据分配有些混乱。
我建议将该代码更改为:(

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    for uri in soup.findAll('uri'):
        uri = str(uri.extract())
        data.append(uri[5:-6])
    print data
    return data

未经测试,因为我不知道 BeautifulStoneSoup 指的是什么)

HTH

Pacific

The problem is you're sort of all over the place in your assignments to data;
I'd suggest changing that code to:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    for uri in soup.findAll('uri'):
        uri = str(uri.extract())
        data.append(uri[5:-6])
    print data
    return data

(untested since I don't know what BeautifulStoneSoup is refering to)

HTH

Pacific

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文