有没有一种优雅的方法来防止我的程序跳过十年?
我正在编写一个网络爬虫,从维基百科的十年文章中获取内容。 (例如关于10s、1970s,1670s BC,等等。)
我正在使用与此类似的逻辑来抓取页面。
for (i = -1690; i <= 2010; i += 10)
if (i < 0)
page = (-i) + "s_BC"
else
page = i + "s"
GrabContentFromURL("http://en.wikipedia.org/wiki/" + page)
这是有效的,除了一个我没有考虑到的小细节。
问题是有两个 0。有一个 0s AD 和一个 公元前 0 秒。按照我的循环当前的工作方式,程序仅从 0s AD 页面获取内容。
这是一个非常简单的问题,但我很难想出一个非常好的方法来解决它。我知道我可以将循环体提取到一个单独的函数并使用两个单独的循环,但我觉得有一种更优雅的方法可以做到这一点,但我缺少。
如何在不引入太多复杂性的情况下解决这个问题?
I am writing a web scraper that grabs content from decade articles from wikipedia. (e.g. articles on the 10s, the 1970s, the 1670s BC, and so on.)
I am using logic that resembles this to grab the pages.
for (i = -1690; i <= 2010; i += 10)
if (i < 0)
page = (-i) + "s_BC"
else
page = i + "s"
GrabContentFromURL("http://en.wikipedia.org/wiki/" + page)
This is working, except for one little detail that I hadn't considered.
The problem is that there are two 0s decades. There is a 0s AD and a 0s BC. With the way my loop currently works, the program only grabs the content from the 0s AD page.
This is a pretty simple problem, but I'm having a hard time coming up with a very nice way to fix it. I know I can extract the body of the loop to a separate function and use two separate loops, but I feel like there's a more elegant way to do this that I'm missing.
How can I fix this problem without introducing too much complexity?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您介意一路上浏览几个
404
页面吗?如果该问题的答案是“是的,我介意”,那么您仍然可以添加一些
if
:You mind hitting a few
404
pages along the way?If the answer to that question was "yes, I mind" then you can still toss in some
if
s:如果您只需要一个函数调用,可以这样:
If you only want one function call, how about something like:
存在一个逻辑问题,即当
i = 0
时,如果“BC分支”永远不会运行。我将其更改如下:另一种方法是使用两个循环,一个循环来自
[-1960, 0] by 10
(或[1960, 0] by -10
),然后从[0, 2010] 乘 10
。 (对于具有良好序列支持的语言来说,这是一次循环中的一件大事。)快乐的编码。
There is a logical problem in that when
i = 0
if "BC branch" is never run. I'd change it as so:Another approach is to use two loops, one from
[-1960, 0] by 10
(or[1960, 0] by -10
) and then from[0, 2010] by 10
. (For languages with nice sequence support this is a doozey in one loop.)Happy coding.
在 Python 中,也可以翻译为 CoffeeScript
In Python, could also be translated to CoffeeScript