mhtml 文件的标签及其含义是否有主列表?
我正在尝试从实际上是单文件网页的 xls 文件中读取和提取数据,请参见下面
This document is a Single File Web Page, also known as a Web Archive file.
我正在尝试找出所有标签的含义,以便我可以确保使用 lxml 正确解析它们。
例如,这里是一个标签的示例:
<th class=3Dtl colspan=3D1 rowspan=3D2
虽然我成功地处理了我正在处理的几个文件,但我想尝试弄清楚我所做的假设是否会在以后困扰我。因此,这些标签及其含义的列表会很棒。
I am trying to read and extract data from xls files that are really Single File Web Pages see below
This document is a Single File Web Page, also known as a Web Archive file.
I am trying to figure out the meaning of all of the tags so I can make sure I parse them correctly using lxml.
For example here is an example of a tag:
<th class=3Dtl colspan=3D1 rowspan=3D2
While I am having success working with the few files I am toying with I want to try to figure out if I am making assumptions that will later come back to haunt me. Thus, a list of these tags and their meaning would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果 MHTML 是从 Microsoft Word 生成的,则它可能是 WordprocessingML 和 HTML4 标记。
If the MHTML is generated from Microsoft Word, it's probably a combination of WordprocessingML and HTML4 tags.