导出整个 html
基本上我想做的就是将整个 html 表导出到 .txt 文件(记事本文档)。
到目前为止我已经学会了如何指示浏览器找到带有表格的html页面。
require 'rubygems'
require 'hpricot'
require "watir-webdriver"
url = "http://www.example.com"
browser = Watir::Browser.new
browser.goto url
在 cmd 中运行上述命令后,我现在可以在浏览器中看到 html 表。
这就是我被困住的地方。如何使用 Watir
- 查找标签
- 收集 和 中的所有内容(即 html 和文本)。
- 将这些结果提取到 .txt 文件(记事本文档)并将其保存在特定文件夹中。
仅供参考,html 表格看起来像这样......
<table border="1" cellpadding="2">
<tr>
<th> Address </th>
<th> Council tax band </th>
<th> Annual council tax </th>
</tr>
<tr>
<td> 2, STONELEIGH AVENUE, COVENTRY, CV5 6BZ </td>
<td align="center"> F </td>
<td align="center"> £2125 </td>
</tr>
....... 上面的行重复了很多次......
</table>
然后表格被关闭。
所以回顾一下我的情况。我可以使用 Watir 将浏览器导航到包含 html 表的页面,但我的问题是我不确定如何将结果(标签内的所有内容 - 包括 html)提取到 .txt 文件,然后保存该 .txt文件到我的电脑上。
我更愿意使用 Watir 采取更小的步骤。我知道这一点,因此我只想学习如何提取表并将我提取的所有内容保存到 .txt 文件中。我在网上看到了几个使用 hpricot 的例子。然而,大多数示例似乎错过了详细说明如何将数组(如果这是正确的方法)输出到 .txt 文件中的代码。
您能否帮忙演示如何编写一段简单的代码,将 html 表(以及所有内容,包括 和 之间的所有内容)提取到 .txt 记事本文件中?
非常感谢您抽出时间。
Basically all I would like to do is export a whole html table to a .txt file (notepad document).
So far I have learnt how to instruct the browser to find the html page with the table.
require 'rubygems'
require 'hpricot'
require "watir-webdriver"
url = "http://www.example.com"
browser = Watir::Browser.new
browser.goto url
After running the above in cmd I can now see the html table in the browser.
This is where I am stuck. How do I use Watir to
- Find the tag
- collect everything (i.e. the html , and the text) which is within and .
- Extract those results to a .txt file (notepad document) and save it in a specific folder.
FYI the html table looks like this...
<table border="1" cellpadding="2">
<tr>
<th> Address </th>
<th> Council tax band </th>
<th> Annual council tax </th>
</tr>
<tr>
<td> 2, STONELEIGH AVENUE, COVENTRY, CV5 6BZ </td>
<td align="center"> F </td>
<td align="center"> £2125 </td>
</tr>
....... The above row is repeated many time ......
</table>
Then the table is closed.
So to re-cap my situation. I can use Watir to navigate the browser to the page containing the html table but my problem is that I am unsure of how to extract the results (everything within the tag - including the html) to a .txt file and then save that .txt file onto my computer.
I would prefer to take smaller steps with using Watir. I am knew to it therefore I would just like to learn how to extract the table and save everything that I have extracted into a .txt file. I have seen a couple of examples online using hpricot. However most of the examples seem to miss off code detailing how the array (if that is the correct approach) is outputted into a .txt file.
Could you help by demonstrating how to write a simple piece of code which will extract the html table ( and everything, including the , and everything in between) to a .txt notepad file?
Many thanks for your time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要获取整个表格的 HTML(如果它是页面上唯一的表格):
您将得到如下内容:
要获取每行的 HTML 并将其放入数组中:
要获取每个单元格的文本并将其放入数组中数组:
将每个单元格的文本写入文件:
该文件将如下所示:
To get HTML of the entire table (if it is the only table on the page):
You will get something like this:
To get HTML of each row and put it in an array:
To get text of each cell and put it in an array:
To write text of each cell to file:
The file will look like this:
有很多方法可以解决这个问题,如果我们更多地了解您具体想要实现的目标,那么我们可以为您提供更具体而不是笼统的答案。
如果您想将内容转换为数组,您可以使用
.collect
,如 Zeljko 所示。如果您只想处理数据或迭代表中的行和单元格,那么.each
或.each_with_index
可能就是您想要的。我怀疑您确实想要表格中的文本,而不是 HTML。因此,这里有一些可以尝试的东西(未经测试,但应该可以工作)
如果
.rows
或.cells
在上面不起作用(未知方法),请尝试用替换。 trs
和
.tds
分别(并非所有版本的 watir 都有友好的这些方法的别名)
看看是否会吐出您感兴趣的内容。如果是这样,您应该能够轻松修改以将所需内容写入文件,而不是将其显示到屏幕上。
但是,如果验证是您的目标,那么让自动化代码在数据库中查找内容并为您进行比较可能会更容易。
There are a lot of ways to approach this, if we know a bit more about what you are specifically trying to accomplish, then we can give you answers that are also a bit more specific instead of general.
You can use
.collect
as Zeljko has shown if you want to convert stuff to arrays. If you just want to work with the data or iterate over the rows and cells in the table then.each
or.each_with_index
may be what you want.I suspect you really want the text from the table, not the HTML. So here's something to try (untested but it should work)
if
.rows
or.cells
does not work (unknown method) in the above, try replacing with.trs
and
.tds
respectively (not all versions of watir have the friendlyaliases for those methods)
See if that spits out what you are interested in. If so, you should be able to easily modify to write what you want to a file instead of putting it to the screen.
However if verification is your goal, then it might be easier to have the automation code look things up in the db and do the comparison for you.