将字符串添加到 URL 末尾

发布于 2024-08-31 02:26:20 字数 3284 浏览 10 评论 0原文

为了练习更多的 Python 知识，我一直在尝试 pythonchallenge.com 上的挑战。

简而言之，作为第一步，这个挑战需要从末尾带有数字的 url 加载 html 页面。该页面包含一行文本，其中有一个数字。该数字用于替换 url 中的现有数字，从而将您带到序列中的下一页。显然，这种情况会持续一段时间......（这个挑战还有更多，但让该部分正常工作是第一步）。

我这样做的代码如下（暂时仅限于运行序列中的前四页）。由于某种原因，它第一次工作 - 它获取序列中的第二页，读取数字，转到第三页，然后读取那里的数字。但随后它就卡在了第三个。我不明白为什么，但认为这可能与我尝试将数字转换为字符串然后将其放在 URL 末尾有关。要回答这个明显的问题，是的，我知道 pythonchallenge 工作正常 - 只要你有耐心，你就可以手动执行 url-numbers 操作，以确认，如果你愿意的话：p

import httplib2
import re

counter = 0
new = '12345' #the number for the initial page in the sequence, as a string

while True:
    counter = counter + 1
    if counter == 5:
        break

    original = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing='
    nextpage = original+new     #each page in the sequence is visited by adding 
                                #the number after 'nothing='
    print(nextpage)

    h = httplib2.Http('.cache')
    response, content = h.request(nextpage, "GET")  #get the content of the page, 
                                                    #which includes the number for the 
                                                    #*next* page in the sequence

    p = re.compile(r'\d{4,5}$')     #regex to find a 4 to 5 digit number at the end of
                                    #the content

    new = str((p.findall(content)))     #make the regex result a string - is this
                                            #where the problem lies?

    print('cached?', response.fromcache)    #I was worried my requests were somehow
                                            #being cached not actually sent afresh to
                                            #pythonchallenge. But it seems they aren't.

    print(content)
    print(new)

上面的输出如下，以下。第一次运行似乎工作正常（将 92512 添加到 url 并成功获取下一页并找到下一个值），但之后它就卡住了，并且似乎没有按顺序加载下一页。通过在浏览器中手动更改 url 进行测试，确认数字正确并且 pythonchallenge 工作正常。

在我看来，将我的正则表达式搜索转换为字符串以添加到 URL 的末尾时出现了问题 - 但为什么它应该第一次工作而不是第二次我不知道。我还担心我的请求可能只到达缓存（我是 httplib2 的新手，对它如何缓存没有信心），但事实似乎并非如此。我还向请求添加了一个无缓存参数，只是为了确定（此代码中未显示），但它没有帮助。

<块引用>
http://www.pythonchallenge.com/pc/def /linkedlist.php?nothing=12345
('缓存？', False)
接下来什么都没有是 92512
['92512']
http://www.pythonchallenge.com/pc/def /linkedlist.php?nothing=['92512']
('缓存？', False)
接下来什么都没有是 72758
['72758']
http://www.pythonchallenge.com/pc/def /linkedlist.php?nothing=['72758']
('缓存？', False)
接下来什么都没有是 72758
['72758']
http://www.pythonchallenge.com/pc/def /linkedlist.php?nothing=['72758']
('缓存？', False)
接下来什么都没有是 72758
['72758']

我将不胜感激任何能够指出我出错的地方以及任何相关提示的人

提前致谢...

原文

To practise some more bits of python I've been having a go at the challenges on pythonchallenge.com

In brief, this challenge as a first step requires one to load an html page from a url with a number at the end. The page contains a single line of text which has in it a number. That number is used to replace the existing one in the url, and so take you to the next page in the sequence. Apparently this continues for some time... (there is more to this challenge, but getting that part working is the first step).

My code for doing so is below (limited to running through what should be the first four pages in the sequence, for the time being). For some reason it works the first time - it gets the second page in the sequence, reads the number, goes to the third, and reads the number there. But then it gets stuck on the third. I don't understand why, though think it might be something to do with my attempt to turn the number into a string before putting it on the end of the URL. To answer the obvious question, yes I know that pythonchallenge is working OK - you can do the url-numbers thing manually for as long as you have the patience, to confirm, if you like :p

import httplib2
import re

counter = 0
new = '12345' #the number for the initial page in the sequence, as a string

while True:
    counter = counter + 1
    if counter == 5:
        break

    original = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing='
    nextpage = original+new     #each page in the sequence is visited by adding 
                                #the number after 'nothing='
    print(nextpage)

    h = httplib2.Http('.cache')
    response, content = h.request(nextpage, "GET")  #get the content of the page, 
                                                    #which includes the number for the 
                                                    #*next* page in the sequence

    p = re.compile(r'\d{4,5}
And the output of the above is as follows, below. It seems to work fine for the first run through (adding the 92512 to the url and successfully getting the next page and finding the next value) but after that it just gets stuck, and doesn't seem to load the following page in the sequence. Testing by changing the url manually in a browser confirms that the number is correct and pythonchallenge is working OK.
It looks to me like something is going wrong turning my regex search into a string to add onto the end of the URL - but why it should work the first time and not the second I don't know. I was also concerned maybe my requests were only getting as far as a cache (I'm new to httplib2 and not confident about how it does caching) but they seem not to be. I also added a no-cache argument to the request just to be sure (not shown in this code) but it didn't help.


http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
('cached?', False) 
and the next nothing is 92512
['92512']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['92512']
('cached?', False) 
and the next nothing is 72758
['72758']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
('cached?', False)
and the next nothing is 72758
['72758']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
('cached?', False)
and the next nothing is 72758
['72758']


I would be grateful to anyone who can point out where I am going wrong, as well as for any relevant tips
Thanks in advance...
)     #regex to find a 4 to 5 digit number at the end of
                                    #the content

    new = str((p.findall(content)))     #make the regex result a string - is this
                                            #where the problem lies?

    print('cached?', response.fromcache)    #I was worried my requests were somehow
                                            #being cached not actually sent afresh to
                                            #pythonchallenge. But it seems they aren't.

    print(content)
    print(new)

And the output of the above is as follows, below. It seems to work fine for the first run through (adding the 92512 to the url and successfully getting the next page and finding the next value) but after that it just gets stuck, and doesn't seem to load the following page in the sequence. Testing by changing the url manually in a browser confirms that the number is correct and pythonchallenge is working OK.

It looks to me like something is going wrong turning my regex search into a string to add onto the end of the URL - but why it should work the first time and not the second I don't know. I was also concerned maybe my requests were only getting as far as a cache (I'm new to httplib2 and not confident about how it does caching) but they seem not to be. I also added a no-cache argument to the request just to be sure (not shown in this code) but it didn't help.

http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
('cached?', False)
and the next nothing is 92512
['92512']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['92512']
('cached?', False)
and the next nothing is 72758
['72758']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
('cached?', False)
and the next nothing is 72758
['72758']
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
('cached?', False)
and the next nothing is 72758
['72758']

I would be grateful to anyone who can point out where I am going wrong, as well as for any relevant tips

Thanks in advance...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明媚如初 2024-09-07 02:26:20

http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
                                                             ^^     ^^

我想问题就出在这里。 findall() 返回一个列表：

re.findall(模式, 字符串[, 标志])
以字符串列表的形式返回字符串中模式的所有非重叠匹配项。从左到右扫描字符串，并按找到的顺序返回匹配项。如果模式中存在一个或多个组，则返回组列表；如果模式有多个组，这将是一个元组列表。空匹配项将包含在结果中，除非它们触及另一个匹配项的开头。
--Python 文档

http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=['72758']
                                                             ^^     ^^

The problem is here I think. findall() return a list:

re.findall(pattern, string[, flags])
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
-- Python doc

回复收藏 0 原文

~没有更多了~