获取一串 html、将其切碎并将每个部分放入数组中的最佳方法是什么?
我对如何做到这一点有一个大致的了解,但无法确定具体如何完成它。我确信可以使用某种正则表达式来完成。想知道这里是否有人能指出我正确的方向。
如果我有一个像这样的 html 字符串,
some_html = '<div><b>This is some BOLD text</b></div>'
我想将它分成逻辑部分,然后将这些部分放入一个数组中,这样我就会得到这样的结果
html_array = ["<div>", "<b>", "This is some BOLD text", "</b>","</div>" ]
I have a general idea of how I can do this, but can't pinpoint how exactly to get it done. I am sure it can be done with a regex of some sort. Wondering if anyone here can point me in the right direction.
If I have a string of html such as this
some_html = '<div><b>This is some BOLD text</b></div>'
I want to to divide it into logical pieces, and then put those pieces into an array so I end with a result like this
html_array = ["<div>", "<b>", "This is some BOLD text", "</b>","</div>" ]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不使用正则表达式,而是使用 nokogiri gem (用于解析由 Aaron Patterson - Rails 和 Ruby 的贡献者)。以下是如何使用它的示例:
然后您可以调用
html_doc.children
来获取 nodeset 并从那里开始工作Rather than use regex I'd use the nokogiri gem (a gem for parsing html written by Aaron Patterson - contributor to Rails and Ruby). Here's a sample of how to use it:
You can then call
html_doc.children
to get a nodeset and work your way from there使用 HTML 解析器,例如 Nokogiri。使用 SAX,您可以在触发事件时向数组添加标签/元素。
不是一个好主意尝试使用正则表达式 HTML,除非您打算只处理一小部分确定的子集。
Use an HTML parser, for instance, Nokogiri. Using SAX you can add tags/elements to the array as events are triggered.
It's not a good idea to try to regex HTML, unless you're planning to treat only a small determined subset of it.