在 JAVA 中如何确定 HTML 文档的格式是否正确?

发布于 2024-10-20 00:51:27 字数 1160 浏览 2 评论 0原文

嘿伙计们,我需要确定给定的 HTML 文档是否格式良好。
我只需要一个仅使用 Java 核心 API 类的简单实现,即没有像 JTIDY 之类的第三方东西。谢谢。

实际上,我们真正需要的是一个扫描标签列表的算法。如果它找到一个开放标签,并且下一个标签不是其相应的关闭标签,那么它应该是另一个开放标签,而该标签又应该将其关闭标签作为下一个标签,如果没有,它应该是另一个开放标签,然后接下来是其相应的关闭标签,以及列表中下一个以相反顺序排列的前一个打开标签的关闭标签。我已经编写了将标签转换为结束标签的方法。如果列表符合此顺序,则返回 true,否则返回 false。

这是我已经开始研究的框架代码。它不是太简洁,但它应该让你们对我正在尝试做的事情有一个基本的了解。

public boolean validateHtml(){

    ArrayList<String> tags = fetchTags();
    //fetchTags returns this [<html>, <head>, <title>, </title>, </head>, <body>, <h1>, </h1>, </body>, </html>]

    //I create another ArrayList to store tags that I haven't found its corresponding close tag yet
    ArrayList<String> unclosedTags = new ArrayList<String>();

    String temp;

    for (int i = 0; i < tags.size(); i++) {

        temp = tags.get(i);

        if(!tags.get(i+1).equals(TagOperations.convertToCloseTag(tags.get(i)))){
            unclosedTags.add(tags.get(i));
            if(){

            }

        }else{
            return true;//well formed html
        }
    }

    return true;
}

Heyy guys, I need to determine if a given HTML Document is well formed or not.
I just need a simple implementation using only Java core API classes i.e. no third party stuff like JTIDY or something. Thanks.

Actually, what is exactly needed is an algorithm that scans a list of TAGS. If it finds an open tag, and the next tag isn't its corresponding close tag, then it should be another open tag which in turn should have its close tag as the next tag, and if not it should be another open tag and then its corresponding close tag next, and the close tags of the previous open tags in reverse order coming next on the list. I've already written methods to convert a tag to a close tag. If the list conforms to this order then it returns true or else false.

Here is the skeleton code of what I've started working on already. Its not too neat, but it should give you guys a basic idea of what I'm trying to do.

public boolean validateHtml(){

    ArrayList<String> tags = fetchTags();
    //fetchTags returns this [<html>, <head>, <title>, </title>, </head>, <body>, <h1>, </h1>, </body>, </html>]

    //I create another ArrayList to store tags that I haven't found its corresponding close tag yet
    ArrayList<String> unclosedTags = new ArrayList<String>();

    String temp;

    for (int i = 0; i < tags.size(); i++) {

        temp = tags.get(i);

        if(!tags.get(i+1).equals(TagOperations.convertToCloseTag(tags.get(i)))){
            unclosedTags.add(tags.get(i));
            if(){

            }

        }else{
            return true;//well formed html
        }
    }

    return true;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

書生途 2024-10-27 00:51:27

是的,字符串操作有时看起来像一个泡菜,
你需要做一些像

First copy html into an array

bool tag = false;
string str = "";
List<string> htmlTags = new List();

for(int i = 0; i < array.length; i++)
{ 
  //Check for the start of a tag
  if(array[i] == '<')
  {
    tag == true;
  }

  //If the current char is part of a tag start copying
  if(tag)
  {
    str += char;
  }

  //When a tag ends add the tag to your tag list
  if(array[i] == '>')
  {
    htmlTags.Add(str);
    str = "";
    tag == false;
  }
}

这样的事情应该让你开始,你应该得到一个标签数组,这只是伪代码,所以它不会编译

Yeah string manipulation can seem like a pickle sometimes,
you need to do something like

First copy html into an array

bool tag = false;
string str = "";
List<string> htmlTags = new List();

for(int i = 0; i < array.length; i++)
{ 
  //Check for the start of a tag
  if(array[i] == '<')
  {
    tag == true;
  }

  //If the current char is part of a tag start copying
  if(tag)
  {
    str += char;
  }

  //When a tag ends add the tag to your tag list
  if(array[i] == '>')
  {
    htmlTags.Add(str);
    str = "";
    tag == false;
  }
}

Something like this should get you started, you should end up with an array of tags, this is only pseudo code so it wont shouldn't compile

唱一曲作罢 2024-10-27 00:51:27

不要以为你不需要做大量的工作就可以做到这一点,使用第三方包会更容易

Don't think you can do this without undertaking a huge amount of work, would be much easier to use a third party package

冰魂雪魄 2024-10-27 00:51:27

尝试根据 HTML4 或 4.1 或 XHTML 1 DTD 进行验证

"strict.dtd"
"loose.dtd"
"frameset.dtd"

这可能会有所帮助!

Try validating against HTML4 or 4.1 or XHTML 1 DTD

"strict.dtd"
"loose.dtd"
"frameset.dtd"

Which might help !

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文