使用 BeautifulSoup 在网页中查找特定链接

发布于 2024-12-21 07:21:54 字数 495 浏览 1 评论 0原文

from BeautifulSoup import BeautifulSoup
import urllib2
import re


user = raw_input('begin here!: ')
base = ("http://1337x.org/search/")
print (base + user)
add_on = "/0/"
total_link = (base + user + add_on)
html_data = urllib2.urlopen(total_link, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce

我正在尝试检索 torrent 链接，最好是第一个非赞助链接。从页面然后让它打印链接。我对这个编码相当陌生，所以你能提供的尽可能多的细节将是完美的！非常感谢您的帮助！

原文

from BeautifulSoup import BeautifulSoup
import urllib2
import re


user = raw_input('begin here!: ')
base = ("http://1337x.org/search/")
print (base + user)
add_on = "/0/"
total_link = (base + user + add_on)
html_data = urllib2.urlopen(total_link, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce

i am attempting to retrieve a torrent link preferably the first non sponsored link. from a page and then have it print the link. i am rather new at this coding so as much detail as you can give would be perfect! thank you so much for the help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灯下孤影 2024-12-28 07:21:55

问题出在你的正则表达式中。您尝试使用 ^ 字符来否定正则表达式，但它在您的情况下不起作用。 ^ 仅对一组字符（[] 内的一组字符）求反；即使在这种情况下，它也只会在它是第一个字符时才会否定。例如，[^aeiou] 表示“任何字符除了 a、e、i、<代码>o和<代码>u”。

当您在字符集之外使用 ^ 时，它会匹配行的开头。例如，^aeiou 匹配以 aeiou 字符串开头的行。

那么，如何否定正则表达式呢？嗯，我认为最好的方法是使用否定前瞻，这是一个以 (?! 开头并以 ) 结尾的正则表达式。对于您的情况，这非常简单：

(?!/announcelist)

因此，将 re.compile("^/announcelist") 替换为 re.compile ("(?!/announcelist)") 它应该可以工作 - 至少在这里工作:)