jTidy 整理 HTML 后不返回任何内容
我在使用 jTidy(在 Android 上)时遇到了一个非常烦人的问题。 发现 jTidy 适用于我测试过的每个 HTML 文档,除了以下内容:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<!-- Always force latest IE rendering engine & Chrome Frame
Remove this if you use the .htaccess -->
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<title>templates</title>
<meta name="description" content="" />
<meta name="author" content="" />
<meta name="viewport" content="width=device-width; initial-scale=1.0" />
<!-- Replace favicon.ico & apple-touch-icon.png in the root of your domain and delete these references -->
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="apple-touch-icon" href="/apple-touch-icon.png" />
</head>
<body>
<div>
<header>
<h1>Page Heading</h1>
</header>
<nav>
<p><a href="/">Home</a></p>
<p><a href="/contact">Contact</a></p>
</nav>
<div>
</div>
<footer>
<p>© Copyright</p>
</footer>
</div>
</body>
</html>
我 = true)
不过我注意到一些非常有趣的事情:如果我删除 HTML 正文部分中的所有内容,jTidy 就会完美地工作。 里有什么东西吗? jTidy 不喜欢?
这是我正在使用的Java代码:
public String tidy(String sourceHTML) {
StringReader reader = new StringReader(sourceHTML);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Tidy tidy = new Tidy();
tidy.setMakeClean(true);
tidy.setQuiet(false);
tidy.setIndentContent(true);
tidy.setSmartIndent(true);
tidy.parse(reader, baos);
try {
return baos.toString(mEncoding);
} catch (UnsupportedEncodingException e) {
return null;
}
}
我的Java有问题吗?这是 jTidy 的错误吗?有什么办法可以让 jTidy 不这样做吗? (我无法更改 HTML)。如果这个问题绝对无法解决,还有其他好的 HTML Tidiers 吗?非常感谢!
I have come across a very annoying problem when using jTidy (on Android). I have found jTidy works on every HTML Document I have tested it against, except the following:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<!-- Always force latest IE rendering engine & Chrome Frame
Remove this if you use the .htaccess -->
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<title>templates</title>
<meta name="description" content="" />
<meta name="author" content="" />
<meta name="viewport" content="width=device-width; initial-scale=1.0" />
<!-- Replace favicon.ico & apple-touch-icon.png in the root of your domain and delete these references -->
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="apple-touch-icon" href="/apple-touch-icon.png" />
</head>
<body>
<div>
<header>
<h1>Page Heading</h1>
</header>
<nav>
<p><a href="/">Home</a></p>
<p><a href="/contact">Contact</a></p>
</nav>
<div>
</div>
<footer>
<p>© Copyright</p>
</footer>
</div>
</body>
</html>
But after tidying it, jTidy returns nothing (as in, if the String containing the Tidied HTML is called result, result.equals("") == true)
I have noticed something very interesting though: if I remove everything in the body part of the HTML jTidy works perfectly. Is there something in the <body></body> jTidy doesn't like?
Here is the Java code I am using:
public String tidy(String sourceHTML) {
StringReader reader = new StringReader(sourceHTML);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Tidy tidy = new Tidy();
tidy.setMakeClean(true);
tidy.setQuiet(false);
tidy.setIndentContent(true);
tidy.setSmartIndent(true);
tidy.parse(reader, baos);
try {
return baos.toString(mEncoding);
} catch (UnsupportedEncodingException e) {
return null;
}
}
Is there something wrong with my Java? Is this an error with jTidy? Is there any way I can make jTidy not do this? (I cannot change the HTML). If this absolutely cannot be fixed, are there any other good HTML Tidiers? Thanks very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
试试这个:
可能存在解析错误。
Try this:
There are probably parse errors.
查看 Jsoup,这是我对任何类型的 Java Html 处理的推荐(我已经使用过 HtmlCleaner,但是然后切换到jsoup)。
使用 Jsoup 清理 Html:
就这些!
或者(如果你想更改/删除/解析/...)某些内容:
Check out Jsoup, it's my recommendation for any kind of Java Html processing (i've used HtmlCleaner to, but then switched to jsoup).
Cleaning Html with Jsoup:
Thats all!
Or (if you want to change / remove / parse / ...) something: