Python href 并保存到 .txt(不用担心,不是另一个正则表达式问题)
我目前正在创建一个 python 脚本,允许用户输入 torrent 的哈希值(通过终端),并通过网站检查更多跟踪器。然而,我很茫然,希望能得到一些建议,因为我是Python编程的新手。我遇到了麻烦,因为 html_page 的结果有另一个链接可供访问。所以,我的程序分配 html_page "http://torrentz.eu/******* 但是,现在我发现自己在尝试让它跟随页面上的另一个链接到达 http://torrentz.eu/announcelist_ * ...话虽这么说,我有发现它可以被检索(就像从查看源代码中看到的那样)
<a href="/announcelist_********" rel="e">µTorrent compatible list here</a>
或者可能从这里检索,因为值与 /announcelist_** 中出现的值相同,
<a name="post-comment"></a>
<input type="hidden" name="torrent" value="******" />
因为 /announcelist_* >* 以文本格式出现 我还想知道如何将生成的跟踪器列表保存在 .txt 文件中 话虽如此,这是我目前在 Python 脚本方面的进展
from BeautifulSoup import BeautifulSoup
import urllib2
import re
var = raw_input("Enter hash:")
html_page = urllib2.urlopen("http://torrentz.eu/" +var)
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
print link.get('href')
。你们都在提前获得您的支持、知识、建议和技能
编辑:我已将代码更改为如下:
from BeautifulSoup import BeautifulSoup
import urllib2
import re
hsh = raw_input("Enter Hash:")
html_data = urllib2.urlopen("http://torrentz.eu/" +hsh, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce
结果:
<a href="/announcelist_00000" rel="e">µTorrent compatible list here</a>
所以,现在我只是在寻找一种获取 /announcelist_00000
I currently am working on creating a python script that allows a user to input a torrent's hash (via terminal), and checks for more trackers via a website. However, I am at a loss and was hoping to receive some advice since I'm new to Python programming. I'm running into trouble since my result from html_page has another link to go to. So, my program assigns html_page "http://torrentz.eu/******* but, now I find myself trying to get it to follow another link on the page to arrive at http://torrentz.eu/announcelist_* ... that being said, I have found it can be retrieved (as it would appear from viewing the source)
<a href="/announcelist_********" rel="e">µTorrent compatible list here</a>
or possibly retrieved from here since values are same as they appear in /announcelist_**
<a name="post-comment"></a>
<input type="hidden" name="torrent" value="******" />
Since the /announcelist_** appears in text format I was also wondering how I might be able to save the resulting tracker list in a .txt file. That being said, this is my progress as of now on the Python scripting.
from BeautifulSoup import BeautifulSoup
import urllib2
import re
var = raw_input("Enter hash:")
html_page = urllib2.urlopen("http://torrentz.eu/" +var)
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
print link.get('href')
I'd also like to thank all of y'all in advance for your support, knowledge, advice, and skills.
Edit: I've altered the code to appear as follows:
from BeautifulSoup import BeautifulSoup
import urllib2
import re
hsh = raw_input("Enter Hash:")
html_data = urllib2.urlopen("http://torrentz.eu/" +hsh, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce
Which results in:
<a href="/announcelist_00000" rel="e">µTorrent compatible list here</a>
So, now I'm just looking for a way to get the /announcelist_00000 portion of output only.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
打开网址后,您就可以找到您指出的
href
。现在,使用urlopen
打开该href
。当您遇到要复制的文件时,请像这样打开它:以下是您应该如何执行此操作:
我尚未测试此代码,但我认为它应该可以工作。
希望这有帮助
Once you have opened the url, you are able to find the
href
as you point out. Now, open thathref
usingurlopen
. When you encounter the file that you want to copy over, open it like so:Here's how you should probably go about doing this:
I haven't tested this code, but I think it should work.
Hope this helps
如果您要查找的是 href 属性的值,那么
看看如果添加以下行会得到什么:
If what you are looking for is the value of the href attribute, then
see what you get if you add the line: