Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 10 years ago.
The community reviewed whether to reopen this question 2 years ago and left it closed:
Original close reason(s) were not resolved
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(7)
使用 HTML Tidy 将 HTML 转换为 XML
< a href="http://tidy.sourceforge.net/#binaries" rel="nofollow noreferrer">可下载的二进制文件
JRoppert,根据您的需要,我想您可能想看看在来源
Convert from HTML to XML with HTML Tidy
Downloadable Binaries
JRoppert, For your need, i guess you might want to look at the Sources
您可以使用 HTML Agility Pack。 它来自 CodePlex 的开源项目。
You can use a HTML Agility Pack. Its open-source project from CodePlex.
Validator.nu HTML Parser 附带一个 HTML2XML 示例程序,该程序使用 HTML5 解析算法进行转换和信息集强制规则。
The Validator.nu HTML Parser comes with an HTML2XML sample program that does the conversion using the HTML5 parsing algorithm and infoset coercion rules.
将 Html2Xhtml 用于 .NET 4.0:
内存中字符串到字符串转换:
内存中字符串到 XDocument 转换:
请参阅 http://corsis.sourceforge.net/index.php/Html2Xhtml 了解更多信息。
Use Html2Xhtml for .NET 4.0:
In-memory string-to-string conversion:
In-memory string-to-XDocument conversion:
See http://corsis.sourceforge.net/index.php/Html2Xhtml for more information.
http://corsis.sourceforge.net/index.php/Html2Xhtmlhttp:// corsis.sourceforge.net/index.php/Html2Xhtml
Html2Xhtml 是一个 .NET 4.0 库,用于将 HTML 转换为 GPLv2 或更高版本许可的 XHTML。
我在欧盟大型在线数据库的本地重建中测试了Html2Xhtml。 Tidy/Tidy.NET 在大多数情况下甚至不会产生有效的输出,Chilkat 的 HTML 到 XML 有点慢并且产生奇怪的结果(放错位置、丢失、无法解释的元素)。 为了寻找一个免费、快速且可靠的转换工具,我创建了这个库。 它的转换速度比我测试的所有其他库快 2 - 4 倍。
Html2Xhtml 与 LINQ to XML 的强大功能相结合,是适用于所有大规模数据提取和 Web 爬行场景的出色工具。
http://corsis.sourceforge.net/index.php/Html2Xhtmlhttp://corsis.sourceforge.net/index.php/Html2Xhtml
Html2Xhtml is a .NET 4.0 library for converting HTML to XHTML licensed under GPLv2 or above.
I tested Html2Xhtml in the local reconstruction of a large online database of the European Union. Tidy/Tidy.NET would not even produce valid output most of the time, Chilkat's HTML-to-XML was a bit slow and produced strange results (misplaced, missing, unexplainable elements). In attempt to find a free, fast and reliable conversion tool I created this library. It converts 2 - 4x faster than all other libraries I tested.
Html2Xhtml, combined with the power of LINQ to XML, is an excellent tool for all large-scale data extraction and web crawling scenarios.
您可以使用 tidy 可执行文件将 html 转换为 xhtml:
tidy -asxhtml -numeric < 索引.html> index.xhml
您可以在此处检查 C# 实现。
you can convert html to xhtml with tidy executable file:
tidy -asxhtml -numeric < index.html > index.xhml
you can check the c# implementation here.
最简单的方法是设置 Visual Studio IDE 以识别您需要进行的更改。
您可以在 Visual Studio 2008 中执行此操作,方法是:
工具、选项、文本编辑器、HTML、验证并选择适当的目标。
可能是 XHTML 1.1 或 XHTML 1.0 过渡。
有关不同类型的一些信息,请阅读:
http://msdn.microsoft.com/en-us/library/aa479043。 aspx
然后您需要处理页面上突出显示的点。
The easiest way is to set your Visual Studio IDE to identify the changes you need to make.
You can do this in Visual Studio 2008 by going to:
Tools, Options, Text Editor, HTML, Validation and choosing the appropriate target.
Possibly XHTML 1.1 or XHTML 1.0 Transitional.
For some information on the different types, read:
http://msdn.microsoft.com/en-us/library/aa479043.aspx
Then you need to work through the points highlighted on your page.