操作 HTML
我需要读取 HTML 文件并搜索其中的一些标签。根据结果,需要删除一些标签,更改其他标签,并可能细化一些属性 - 然后将文件写回。
NSXMLDocument 是正确的选择吗?我认为在这种情况下并不真正需要解析器,它甚至可能意味着更多的工作。而且我不想触及整个文件,我所需要做的就是将文件加载到内存中,更改一些内容,然后再次保存。
请注意,我将处理 HTML,而不是 XHTML。这可能是 NSXMLDocument 的问题吗?也许一些不匹配的标签或未关闭的标签可能会使其停止工作。
I need to read a HTML file and search for some tags in it. Based on the results, some tags would need to be removed, other ones changed and maybe refining some attributes — to then write the file back.
Is NSXMLDocument the way to go? I don't think that a parser is really needed in this case, it could even mean more work. And I don't want to touch the entire file, all I need to do is to load the file in memory, change some things, and save it again.
Note that, I'll be dealing with HTML, and not XHTML. Could that be a problem for NSXMLDocument? Maybe some unmatched tags or un-closed ones could make it stop working.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
NSXMLDocument 是正确的选择。这样你就可以使用 Xpath/Xquery 来查找你想要的标签。糟糕的 HTML 可能是一个问题,但你可以设置 NSXMLDocumentTidyHTML 并且应该没问题,除非它真的很糟糕。
NSXMLDocument is the way to go. That way you can use Xpath/Xquery to find the tags you want. Bad HTML might be a problem but you can set NSXMLDocumentTidyHTML and it should be OK unless it's really bad.
然后将 Finalstr 写入文件。
这就是我要做的,请注意,我并不完全知道使用 NSXMLDocument 的优点是什么,这应该可以完美完成。
and then write finalstr to the file.
This is what I would do, note that I don't exactly know what the advantages of using NSXMLDocument would be, this should do it perfectly.
NSXMLDocument
可能会失败,因为 HTML 页面的格式不正确,但您可以尝试使用NSXMLDocumentTidyHTML
/NSXMLDocumentTidyXML
(您可以使用它们都可以改善结果)如概述这里,也可以看看 this 用于修改 HTML 的 tan 方法。NSXMLDocument
will possibly fail, due to the fact that HTML pages are not well formed, but you can try withNSXMLDocumentTidyHTML
/NSXMLDocumentTidyXML
(you can use them both to improve results) as outlined here and also have a look a this for tan approach at modifying the HTML.