如何在Java中更改HTML标签内容?

发布于 2024-08-14 23:17:57 字数 975 浏览 4 评论 0 原文

如何更改Java中标签的HTML内容?例如:

before:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**text**</div>text</div>
    </body>
</html>

after:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**new text**</div>text</div>
    </body>
</html>

我尝试了 JTidy,但它不支持 getTextContent。还有其他解决办法吗?


谢谢,我想解析没有格式良好的 HTML。我尝试了 TagSoup,但是当我有以下代码时:

<body>
sometext <div>text</div>
</body>

并且我想将“sometext”更改为“someAnotherText”,当我使用 {bodyNode}.getTextContent() 时,它给我:“sometext text”;当我使用 setTextContet("someAnotherText"+{bodyNode}.getTextContent()) 并序列化这些结构时,结果是 someAnotherText sometext text >,没有

标签。这对我来说是个问题。

How can I change HTML content of tag in Java? For example:

before:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**text**</div>text</div>
    </body>
</html>

after:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**new text**</div>text</div>
    </body>
</html>

I tried JTidy, but it doesn't support getTextContent. Is there any other solution?


Thanks, I want parse no well-formed HTML. I tried TagSoup, but when I have this code:

<body>
sometext <div>text</div>
</body>

and I want change "sometext" to "someAnotherText," and when I use {bodyNode}.getTextContent() it gives me: "sometext text"; when I use setTextContet("someAnotherText"+{bodyNode}.getTextContent()), and serialize these structure, the result is <body>someAnotherText sometext text</body>, without <div> tags. This is a problem for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

水染的天色ゝ 2024-08-21 23:17:57

除非您完全确定 HTML 有效且格式良好,否则我强烈建议使用 HTML 解析器,例如 TagSoup杰里科NekoHTMLHTML 解析器 等,前两个对于解析任何类型的废话特别强大:)

例如,使用 HTML 解析器(因为实现非常简单),使用 访问者,提供您自己的 NodeVisitor

public class MyNodeVisitor extends NodeVisitor {
    public MyNodeVisitor() {
    }

    public void visitStringNode (Text string)
    {
        if (string.getText().equals("**text**")) {
            string.setText("**new text**");
        }
    }
}

然后,创建一个 解析器,解析 HTML 字符串并访问返回的节点列表:

Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());

这只是实现此目的的一种方法,非常简单。

Unless you are absolutely sure that the HTML will be valid and well formed, I'd strongly recommend to use an HTML parser, something like TagSoup, Jericho, NekoHTML, HTML Parser, etc, the two first being especially powerful to parse any kind of crap :)

For example, with HTML Parser (because the implementation is very easy), using a visitor, provide your own NodeVisitor:

public class MyNodeVisitor extends NodeVisitor {
    public MyNodeVisitor() {
    }

    public void visitStringNode (Text string)
    {
        if (string.getText().equals("**text**")) {
            string.setText("**new text**");
        }
    }
}

Then, create a Parser, parse the HTML string and visit the returned node list:

Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());

This is just one way to implement this, pretty straight forward.

想你的星星会说话 2024-08-21 23:17:57

假设您的 HTML 是格式良好的 XML(如果不是,那么您可以使用 JTidy 来整理它),您可以使用 DOM 或 SAX 解析器来解析它。如果您的文档不大,DOM 可能会更容易。

如果您的文本是 id="id" 节点的唯一子节点,这样的操作就可以解决问题:

Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
Element e = d.getElementById("id");
Node text = e.getFirstChild();
text.setNodeValue(process(text.getNodeValue());

您可以随后将 d 保存到文件中。

Provided that your HTML is a well-formed XML (if it is not then you may use JTidy to tidify it), you can parse it using DOM or SAX parser. DOM is probably easier if your document is not huge.

Something like this will do the trick if your text is the only child of a node with id="id":

Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
Element e = d.getElementById("id");
Node text = e.getFirstChild();
text.setNodeValue(process(text.getNodeValue());

You may save d afterwards to a file.

二智少女 2024-08-21 23:17:57

此处列出了许多开源 Java HTML 解析器。

我不确定最常用的是什么,但是 这个 (只是称为 HTML 解析器)可能会做什么你想要的。它具有修改树并将其写回的功能。

There are a bunch of Open source Java HTML parsers listed here.

I'm not sure what's most commonly used, but this one (just called HTML parser) will probably do what you want. It has functions to modify your tree and write it back out.

糖粟与秋泊 2024-08-21 23:17:57

通常,您有一个要从中提取数据的 HTML 文档。您大致了解 HTML 文档的结构。

有几个解析器库,但最好的一个是 Jsoup ,您可以使用DOM 方法来导航文档并更新值。在您的情况下,您需要读取文件并使用属性设置器方法。

示例 XHTML 文件:

<?xml version="1.0" encoding="UTF-8"?>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Example</title>
    </head>
    <body>
        <p id="content">Hello World</p>

    </body>
</html>

Java 代码:

     File input = new File("D:\\Projects\\Odata Project\\Odata\\src\\web\\html\\inscription_template.xhtml");
            org.jsoup.nodes.Document doc = Jsoup.parse(input,null);
            org.jsoup.nodes.Element content = doc.getElementById("content");
            System.out.println(content.text("Hi How are you ?"));
            System.out.println(content.text());
            System.out.println(doc);

执行后输出:

<p id="content">Hi How are you ?</p>
Hi How are you ?
<!--?xml version="1.0" encoding="UTF-8"?-->
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
--><!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head> 
  <title>Example</title> 
 </head> 
 <body> 
  <p id="content">Hi How are you ?</p>   
 </body>
</html>

In general you have a HTML document that you want to extract data from. You know generally the structure of the HTML document.

There are several parser libraries but the best one is Jsoup ,you can use the DOM methods to navigate your document and update values.In your case you need to read your file and use the attribute setter methods.

Sample XHTML file :

<?xml version="1.0" encoding="UTF-8"?>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Example</title>
    </head>
    <body>
        <p id="content">Hello World</p>

    </body>
</html>

Java code :

     File input = new File("D:\\Projects\\Odata Project\\Odata\\src\\web\\html\\inscription_template.xhtml");
            org.jsoup.nodes.Document doc = Jsoup.parse(input,null);
            org.jsoup.nodes.Element content = doc.getElementById("content");
            System.out.println(content.text("Hi How are you ?"));
            System.out.println(content.text());
            System.out.println(doc);

Output after execution:

<p id="content">Hi How are you ?</p>
Hi How are you ?
<!--?xml version="1.0" encoding="UTF-8"?-->
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
--><!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head> 
  <title>Example</title> 
 </head> 
 <body> 
  <p id="content">Hi How are you ?</p>   
 </body>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文