在 JAVA 中如何确定 HTML 文档的格式是否正确?
嘿伙计们,我需要确定给定的 HTML 文档是否格式良好。
我只需要一个仅使用 Java 核心 API 类的简单实现,即没有像 JTIDY 之类的第三方东西。谢谢。
实际上,我们真正需要的是一个扫描标签列表的算法。如果它找到一个开放标签,并且下一个标签不是其相应的关闭标签,那么它应该是另一个开放标签,而该标签又应该将其关闭标签作为下一个标签,如果没有,它应该是另一个开放标签,然后接下来是其相应的关闭标签,以及列表中下一个以相反顺序排列的前一个打开标签的关闭标签。我已经编写了将标签转换为结束标签的方法。如果列表符合此顺序,则返回 true,否则返回 false。
这是我已经开始研究的框架代码。它不是太简洁,但它应该让你们对我正在尝试做的事情有一个基本的了解。
public boolean validateHtml(){
ArrayList<String> tags = fetchTags();
//fetchTags returns this [<html>, <head>, <title>, </title>, </head>, <body>, <h1>, </h1>, </body>, </html>]
//I create another ArrayList to store tags that I haven't found its corresponding close tag yet
ArrayList<String> unclosedTags = new ArrayList<String>();
String temp;
for (int i = 0; i < tags.size(); i++) {
temp = tags.get(i);
if(!tags.get(i+1).equals(TagOperations.convertToCloseTag(tags.get(i)))){
unclosedTags.add(tags.get(i));
if(){
}
}else{
return true;//well formed html
}
}
return true;
}
Heyy guys, I need to determine if a given HTML Document is well formed or not.
I just need a simple implementation using only Java core API classes i.e. no third party stuff like JTIDY or something. Thanks.
Actually, what is exactly needed is an algorithm that scans a list of TAGS. If it finds an open tag, and the next tag isn't its corresponding close tag, then it should be another open tag which in turn should have its close tag as the next tag, and if not it should be another open tag and then its corresponding close tag next, and the close tags of the previous open tags in reverse order coming next on the list. I've already written methods to convert a tag to a close tag. If the list conforms to this order then it returns true or else false.
Here is the skeleton code of what I've started working on already. Its not too neat, but it should give you guys a basic idea of what I'm trying to do.
public boolean validateHtml(){
ArrayList<String> tags = fetchTags();
//fetchTags returns this [<html>, <head>, <title>, </title>, </head>, <body>, <h1>, </h1>, </body>, </html>]
//I create another ArrayList to store tags that I haven't found its corresponding close tag yet
ArrayList<String> unclosedTags = new ArrayList<String>();
String temp;
for (int i = 0; i < tags.size(); i++) {
temp = tags.get(i);
if(!tags.get(i+1).equals(TagOperations.convertToCloseTag(tags.get(i)))){
unclosedTags.add(tags.get(i));
if(){
}
}else{
return true;//well formed html
}
}
return true;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,字符串操作有时看起来像一个泡菜,
你需要做一些像
First copy html into an array
这样的事情应该让你开始,你应该得到一个标签数组,这只是伪代码,所以它不会编译
Yeah string manipulation can seem like a pickle sometimes,
you need to do something like
First copy html into an array
Something like this should get you started, you should end up with an array of tags, this is only pseudo code so it wont shouldn't compile
不要以为你不需要做大量的工作就可以做到这一点,使用第三方包会更容易
Don't think you can do this without undertaking a huge amount of work, would be much easier to use a third party package
尝试根据 HTML4 或 4.1 或 XHTML 1 DTD 进行验证
这可能会有所帮助!
Try validating against HTML4 or 4.1 or XHTML 1 DTD
Which might help !