使用 Nokogiri 从 HTML 表中删除节点

发布于 2024-11-30 00:01:34 字数 1816 浏览 1 评论 0原文

我已经为此摸不着头脑有一段时间了。在我开始思考之前帮助我。

我有一个 html 文档,其中有一个事件表,其中“输入”和“输出”作为列的一部分。记录可以是 In 或 Out 事件。我不想只获取“In”列中具有值的行,然后将文本保存在具有相同属性的事件模型中。下面的代码是我返回“0”的代码。

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


doc = Nokogiri::HTML <<-EOS
  <table><thead><th>Reference</th><th>Event Date</th><th>Event Details</th><th>In</th><th>Out</th></thead><tbody><tr><td>BCE16</td><td>2011-08-16 11:14:52</td><td>Received from Arap Moi</td><td>30.00</td><td></td></tr><tr><td>B07K2</td><td>2011-08-16 11:10:06</td><td>Sent out to John Doe.</td><td>&nbsp;</td><td>-50.00</td></tr></tbody><tfoot></tfoot></table>
EOS


minus_received = doc.xpath('//td[contains(text(), "Received from")]').each do |node| 
  node.parent.remove
end

p minus_received.to_s

人类可读的标记

<table>
  <thead>
    <th>Reference</th>
    <th>Event Date</th>
    <th>Event Details</th>
    <th>In</th>
    <th>Out</th>
  </thead>

  <tbody>
  <tr>
    <td>BCE16</td>
    <td>2011-08-16 11:14:52</td>
    <td>Received from Arap Moi.</td>
    <td>30.00</td>
    <td></td>
  </tr>
  <tr>
    <td>B07K2</td>
    <td>2011-08-16 11:10:06</td>
    <td>Sent out to John Doe.</td>
    <td>&nbsp;</td>
    <td>-50.00</td>
  </tr>
  </tbody>
  <tfoot></tfoot>
</table>

我感谢您的帮助。

I have been scratching my head over this for a while. Help me out before I start picking my brain.

I have a html document that has an events table which has 'In' and 'Out' as part of the columns. A record can either be an In or Out event. I wan't to only get the rows with values in the 'In' column and then save the text in an event model with the same attributes. The code below is what I have which returns '0'.

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


doc = Nokogiri::HTML <<-EOS
  <table><thead><th>Reference</th><th>Event Date</th><th>Event Details</th><th>In</th><th>Out</th></thead><tbody><tr><td>BCE16</td><td>2011-08-16 11:14:52</td><td>Received from Arap Moi</td><td>30.00</td><td></td></tr><tr><td>B07K2</td><td>2011-08-16 11:10:06</td><td>Sent out to John Doe.</td><td> </td><td>-50.00</td></tr></tbody><tfoot></tfoot></table>
EOS


minus_received = doc.xpath('//td[contains(text(), "Received from")]').each do |node| 
  node.parent.remove
end

p minus_received.to_s

Human Readable markup

<table>
  <thead>
    <th>Reference</th>
    <th>Event Date</th>
    <th>Event Details</th>
    <th>In</th>
    <th>Out</th>
  </thead>

  <tbody>
  <tr>
    <td>BCE16</td>
    <td>2011-08-16 11:14:52</td>
    <td>Received from Arap Moi.</td>
    <td>30.00</td>
    <td></td>
  </tr>
  <tr>
    <td>B07K2</td>
    <td>2011-08-16 11:10:06</td>
    <td>Sent out to John Doe.</td>
    <td> </td>
    <td>-50.00</td>
  </tr>
  </tbody>
  <tfoot></tfoot>
</table>

I appreciate your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

上课铃就是安魂曲 2024-12-07 00:01:34

您正在输出 .each 的值 - 如果您在每次调用完成后查看 doc,则 html 只包含标头和 John Doe。

You're outputting the value of .each - if you look at doc after your each call finishes, the html only contains the header and John Doe.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文