使用 BeautifulSoup 或 LXML.HTML 进行网页抓取
我看过一些网络广播,需要帮助来尝试做到这一点: 我一直在使用lxml.html。雅虎最近改变了网络结构。
目标页面;
http://finance.yahoo.com/quote/IBM/options ?date=1469750400&straddle=true
中看到数据
//*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table
在 Chrome 中使用检查器:我在更多代码
如何将此数据放入列表中。 我想将“LLY”更改为“Msft”吗?
我如何在日期之间切换......并获取所有月份。
I have seen some webcasts and need help in trying to do this:
I have been using lxml.html. Yahoo recently changed the web structure.
target page;
http://finance.yahoo.com/quote/IBM/options?date=1469750400&straddle=true
In Chrome using inspector: I see the data in
//*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table
then some more code
How Do get this data out into a list.
I want to change to other stock from "LLY" to "Msft"?
How do I switch between dates....And get all months.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我知道您说过不能使用
lxml.html
。但这里是如何使用该库来做到这一点,因为它是非常好的库。因此,为了完整起见,我提供了使用它的代码,因为我不再使用BeautifulSoup
——它不受维护、速度慢且 API 丑陋。下面的代码解析页面并将结果写入 csv 文件。
就是这样!
lxml.html
是如此简单又漂亮!可惜你不能使用它。以下是生成的
results.csv
文件中的一些行:I know you said you can't use
lxml.html
. But here is how to do it using that library, because it is very good library. So I provide the code using it, for completeness, since I don't useBeautifulSoup
anymore -- it's unmaintained, slow and has ugly API.The code below parses the page and writes the results in a csv file.
That's it!
lxml.html
is so simple and nice!! Too bad you can't use it.Here's some lines from the
results.csv
file that was generated:下面是一个从股票表中提取所有数据的简单示例:
然后,要提取不同的股票和日期,您需要更改 URL。这是前一天的 MSFT:
http://finance.yahoo.com/q/op ?s=msft&m=2014-11-14
Here is a simple example to extract all data from the stock tables:
Then to extract for different stocks and dates you need to change the URL. Here is Msft for the previous day:
http://finance.yahoo.com/q/op?s=msft&m=2014-11-14
如果您想要原始 json 尝试 MSN
您还可以指定到期日期
?date=11/14/2014
如果您更喜欢 Yahoo json
但您必须从 html 中提取它
到期时间在这里
转换 iso将 unix 时间戳设置为 此处
然后使用 unix 时间戳重新请求其他过期时间
If you'd like raw json try MSN
You can also specify an expiration date
?date=11/14/2014
If you prefer Yahoo json
But you have to extract it from the html
Expirations are here
Convert iso to unix timestamp as here
Then re-request the other expirations with the unix timestamp
基于@hoju 的答案:
Basing the Answer on @hoju: