从 tcl 解析 html 页面
我使用 tdom 版本 0.8.2 来解析 html 页面。
从帮助页面中,我找到了以下命令来获取 ElementById
TCL 代码
set html {<html>
<head>
</head>
<body>
<div id="m">
</div>
</body>
</html>
}
package require tdom
set doc [ dom parse -html $html ]
set node [ $doc getElementById m]
,但是当我执行第二个 set 命令时,我得到一个空字符串。但显然该标签的 id 为 m 。 有人能告诉我哪里出了问题吗?
问候, 米敦
I using tdom version 0.8.2 to parse html pages.
From the help pages I found the following commands to get the ElementById
TCL code
set html {<html>
<head>
</head>
<body>
<div id="m">
</div>
</body>
</html>
}
package require tdom
set doc [ dom parse -html $html ]
set node [ $doc getElementById m]
But when I execute the second set command I get a empty string . But cleary the tag has an id of m .
Can someone tell where am I going wrong ?
Regards,
Mithun
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是您的文档缺少
声明,因此 tDOM 不知道
id
元素将被解释为 ID。如果我们添加 DOCTYPE,一切都会起作用...
为我生成以下输出:
您可以通过执行搜索来检查该文档是否正在被解析,以查看该元素是否可以使用 XPath 找到,如下所示
: em>did 产生了正确的输出(如上所述),很明显,问题出在属性的解释上,而属性又直接指向缺失的 DOCTYPE。
更新
这实际上是 tDOM 0.8.3 中修复的一个错误。
The problem is that your document lacks a
<!DOCTYPE>
declaration, so tDOM doesn't know that theid
element is to be interpreted as an ID.If we add a DOCTYPE, it all works...
Produces this output for me:
You could have checked that the document was being parsed at all by doing a search to see if the element is findable using XPath, like this:
Since that did produce the right output (as above) it was clear that the problem had to be in the interpretation of the attribute, which in turn pointed straight at the missing DOCTYPE.
Update
It's actually a bug that was fixed in tDOM 0.8.3.