使用 BeautifulSoup 在网页中查找特定链接

发布于 2024-12-21 07:21:54 字数 495 浏览 1 评论 0原文

from BeautifulSoup import BeautifulSoup
import urllib2
import re


user = raw_input('begin here!: ')
base = ("http://1337x.org/search/")
print (base + user)
add_on = "/0/"
total_link = (base + user + add_on)
html_data = urllib2.urlopen(total_link, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce

我正在尝试检索 torrent 链接,最好是第一个非赞助链接。从页面然后让它打印链接。我对这个编码相当陌生,所以你能提供的尽可能多的细节将是完美的!非常感谢您的帮助!

from BeautifulSoup import BeautifulSoup
import urllib2
import re


user = raw_input('begin here!: ')
base = ("http://1337x.org/search/")
print (base + user)
add_on = "/0/"
total_link = (base + user + add_on)
html_data = urllib2.urlopen(total_link, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce

i am attempting to retrieve a torrent link preferably the first non sponsored link. from a page and then have it print the link. i am rather new at this coding so as much detail as you can give would be perfect! thank you so much for the help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

灯下孤影 2024-12-28 07:21:55

问题出在你的正则表达式中。您尝试使用 ^ 字符来否定正则表达式,但它在您的情况下不起作用。 ^ 仅对一组字符([] 内的一组字符)求反;即使在这种情况下,它也只会在它是第一个字符时才会否定。例如,[^aeiou] 表示“任何字符除了 aei、<代码>o和<代码>u”。

当您在字符集之外使用 ^ 时,它会匹配行的开头。例如,^aeiou 匹配以 aeiou 字符串开头的行。

那么,如何否定正则表达式呢?嗯,我认为最好的方法是使用否定前瞻,这是一个以 (?! 开头并以 ) 结尾的正则表达式。对于您的情况,这非常简单:

(?!/announcelist)

因此,将 re.compile("^/announcelist") 替换为 re.compile ("(?!/announcelist)") 它应该可以工作 - 至少在这里工作:)

The problem is in your regular expression. You are trying to use the ^ character to negate the regex, but it does not work in your situation. The ^ only negates a set of characters (a set of chars inside []); even in this case it only negates if it is the first char. For example, [^aeiou] means "any character except a, e, i, o and u".

When you use ^ outside a character set, then it matches the beginning of a line. For example, ^aeiou matches a line which starts with the aeiou string.

So, how would you negate a regex? Well, the best way I see is to use a negative lookahead, which is a regex that starts with (?! and ends with ). For your case, it is pretty easy:

(?!/announcelist)

So, replace the re.compile("^/announcelist") by re.compile("(?!/announcelist)") and it should work - at least worked here :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文