Python href 并保存到 .txt(不用担心,不是另一个正则表达式问题)

发布于 2024-11-07 16:06:10 字数 1731 浏览 1 评论 0原文

我目前正在创建一个 python 脚本,允许用户输入 torrent 的哈希值(通过终端),并通过网站检查更多跟踪器。然而,我很茫然,希望能得到一些建议,因为我是Python编程的新手。我遇到了麻烦,因为 html_page 的结果有另一个链接可供访问。所以,我的程序分配 html_page "http://torrentz.eu/******* 但是,现在我发现自己在尝试让它跟随页面上的另一个链接到达 http://torrentz.eu/announcelist_ * ...话虽这么说,我有发现它可以被检索(就像从查看源代码中看到的那样)

    <a href="/announcelist_********" rel="e">&#181;Torrent compatible list here</a> 

或者可能从这里检索,因为值与 /announcelist_** 中出现的值相同,

    <a name="post-comment"></a>
    <input type="hidden" name="torrent" value="******" /> 

因为 /announcelist_* >* 以文本格式出现 我还想知道如何将生成的跟踪器列表保存在 .txt 文件中 话虽如此,这是我目前在 Python 脚本方面的进展

    from BeautifulSoup import BeautifulSoup
    import urllib2
    import re
    var = raw_input("Enter hash:")
    html_page = urllib2.urlopen("http://torrentz.eu/" +var)
    soup = BeautifulSoup(html_page)
    for link in soup.findAll('a'):
            print link.get('href')

。你们都在提前获得您的支持、知识、建议和技能

编辑:我已将代码更改为如下:

    from BeautifulSoup import BeautifulSoup
    import urllib2
    import re
    hsh = raw_input("Enter Hash:")
    html_data = urllib2.urlopen("http://torrentz.eu/" +hsh, 'r').read()
    soup = BeautifulSoup(html_data)
    announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
    print announce

结果:

    <a href="/announcelist_00000" rel="e">&#181;Torrent compatible list here</a>

所以,现在我只是在寻找一种获取 /announcelist_00000

I currently am working on creating a python script that allows a user to input a torrent's hash (via terminal), and checks for more trackers via a website. However, I am at a loss and was hoping to receive some advice since I'm new to Python programming. I'm running into trouble since my result from html_page has another link to go to. So, my program assigns html_page "http://torrentz.eu/******* but, now I find myself trying to get it to follow another link on the page to arrive at http://torrentz.eu/announcelist_* ... that being said, I have found it can be retrieved (as it would appear from viewing the source)

    <a href="/announcelist_********" rel="e">µTorrent compatible list here</a> 

or possibly retrieved from here since values are same as they appear in /announcelist_**

    <a name="post-comment"></a>
    <input type="hidden" name="torrent" value="******" /> 

Since the /announcelist_** appears in text format I was also wondering how I might be able to save the resulting tracker list in a .txt file. That being said, this is my progress as of now on the Python scripting.

    from BeautifulSoup import BeautifulSoup
    import urllib2
    import re
    var = raw_input("Enter hash:")
    html_page = urllib2.urlopen("http://torrentz.eu/" +var)
    soup = BeautifulSoup(html_page)
    for link in soup.findAll('a'):
            print link.get('href')

I'd also like to thank all of y'all in advance for your support, knowledge, advice, and skills.

Edit: I've altered the code to appear as follows:

    from BeautifulSoup import BeautifulSoup
    import urllib2
    import re
    hsh = raw_input("Enter Hash:")
    html_data = urllib2.urlopen("http://torrentz.eu/" +hsh, 'r').read()
    soup = BeautifulSoup(html_data)
    announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
    print announce

Which results in:

    <a href="/announcelist_00000" rel="e">µTorrent compatible list here</a>

So, now I'm just looking for a way to get the /announcelist_00000 portion of output only.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

茶色山野 2024-11-14 16:06:10

打开网址后,您就可以找到您指出的 href。现在,使用 urlopen 打开该 href。当您遇到要复制的文件时,请像这样打开它:

remote_file = open(filepath)
locale_file = open(path_to_local_file, 'w')

local_file.write(remote_file.read())
local_file.close()
remote_file.close()

以下是您应该如何执行此操作:

# insert code that you've already written
for link in soup.findAll('a'):
    print link.get('href')
    remote_file = open(link.get('href'))
    local_file = open(path_too_local_file, 'w')
    local_file.write(remote_file.read())
    local_file.close()
    remote_file.close()

我尚未测试此代码,但我认为它应该可以工作。

希望这有帮助

Once you have opened the url, you are able to find the href as you point out. Now, open that href using urlopen. When you encounter the file that you want to copy over, open it like so:

remote_file = open(filepath)
locale_file = open(path_to_local_file, 'w')

local_file.write(remote_file.read())
local_file.close()
remote_file.close()

Here's how you should probably go about doing this:

# insert code that you've already written
for link in soup.findAll('a'):
    print link.get('href')
    remote_file = open(link.get('href'))
    local_file = open(path_too_local_file, 'w')
    local_file.write(remote_file.read())
    local_file.close()
    remote_file.close()

I haven't tested this code, but I think it should work.

Hope this helps

紫南 2024-11-14 16:06:10

如果您要查找的是 href 属性的值,那么
看看如果添加以下行会得到什么:

print announce['href']

If what you are looking for is the value of the href attribute, then
see what you get if you add the line:

print announce['href']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文