如何使用Enlive从指定标签中抓取数据?
有人可以解释一下如何从 标签中抓取内容,其中
具有内容值(实际上在这种情况下我需要
< 的内容;b>
标签用于匹配操作)“Row1 标题”,但在处理过程中没有抓取 标签(或其任何内容)?这是我的测试 HTML:
<table class="table_class">
<tbody>
<tr>
<th>
<b>
Row1 title
</b>
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
<tr>
<th>
Row2 title
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
</tbody>
</table>
我想要提取的数据应该来自这些标签:
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
我已经设法创建返回表的全部内容的函数,但我想排除 结果中的节点,并仅返回来自
节点的数据,我可以将其内容用于进一步解析。谁能帮我解决这个问题吗?
could someone explain me how to scrape content from <td>
tags where the <th>
has content value (actually in this case I need content of <b>
tag for matching operation) "Row1 title", but without scraping <th>
tag (or any of its content) in process? Here is my test HTML:
<table class="table_class">
<tbody>
<tr>
<th>
<b>
Row1 title
</b>
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
<tr>
<th>
Row2 title
</th>
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
</tr>
</tbody>
</table>
Data which I want to extract should come from these tags:
<td>2.660.784</td>
<td>2.944.552</td>
<td>Correct, has 3 td elements</td>
I have managed to create function which returns entire content of the table, but I would like to exclude the <th>
node from result, and to return only data from <td>
nodes, which content I can use for further parsing. Can anyone help me with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 enlive 这样的东西
应该会给你一个所有
td
节点的序列,其形式为{:tag :td :attrs {...} :content (...)}
。我不知道 enlive 使您可以直接获取这些节点的内容。我可能是错的。然后,您可以提取序列的内容,以获取类似于
的内容
(for [line ws-content] (apply str (:content line)))
关于问题您昨天发布了(我假设您仍在使用该页面)-我在那里提供的解决方案有点复杂-但它也很灵活。例如,如果您像这样更改
tag-type
函数(将除
:td
之外的所有节点的返回值更改为::IgnoreNode
那么它只是为您提供了:td
的内容序列,这可能接近您想要的内容,如果您需要更多帮助,请告诉我(作为回复) 。到下面的评论)
我认为单独使用 enlive 不可能根据
:content
选择节点 - 但你当然可以使用 Clojure 来做到这一点。例如,你可以做一些类似
可以工作的事情。 (您可能需要稍微调整
(:content line)
形式..With enlive something like this
should give you a sequence of all the
td
nodes, something of the form{:tag :td :attrs {...} :content (...)}
. I am not aware that enlive gives you the possibility to get the content of those nodes directly. I could be wrong.You could then extract the content of the sequence for something along the lines of
(for [line ws-content] (apply str (:content line)))
In regard to the question you posted yesterday (I am assuming you are still working with that page) - the solution I gave there was a little complex - but its also flexible. For example if you change the
tag-type
function like this(change the return value of all nodes to
::IgnoreNode
except for:td
then it just gives you a sequence of the content of the:td
s which is probably close to what you want. Let me know if you need more help.EDIT (in reply to comments below)
I don't think selecting nodes based on their
:content
is possible with enlive alone - but you can certainly do so with Clojure.for example you could do something like
could work. (you might have to tweak the
(:content line)
form a little..