Android 正则表达式编码

发布于 2024-12-12 07:45:31 字数 621 浏览 0 评论 0原文

我正在使用 HttpClient 下载网站的源代码,然后我想使用正则表达式提取一些数据。不幸的是,该网站采用 iso-8859-1 编码,这似乎会导致问题。下面是下载网站的示例代码:

HttpGet query = new HttpGet(url);
HttpResponse queryResponse = httpClient.execute(query);
String queryText = EntityUtils.toString(queryResponse.getEntity()).replaceAll("\r", " ").replaceAll("\n", " ");

然后是表达式:

Pattern patter = Pattern.compile("<p class=\"qt\">(.*?)</p>");
Matcher matcher = pattern.matcher(queryText);
while (matcher.find()) // do something

问题在于,当存在特殊的 iso-8859-1 字符时,它会丢失一些事件。 (.*?) 似乎与它们不匹配。这个问题的原因是什么?我该如何修复它?

I'm downloading website's source code using HttpClient and then I want to extract some data using regular expressions. Unfortunetely the website is encoded in iso-8859-1 which seems to be causing problems. Here's the sample code to download website:

HttpGet query = new HttpGet(url);
HttpResponse queryResponse = httpClient.execute(query);
String queryText = EntityUtils.toString(queryResponse.getEntity()).replaceAll("\r", " ").replaceAll("\n", " ");

And then the expression:

Pattern patter = Pattern.compile("<p class=\"qt\">(.*?)</p>");
Matcher matcher = pattern.matcher(queryText);
while (matcher.find()) // do something

The problem is that it's missing some occurences, when there are special iso-8859-1 characters. (.*?) doesn't seem to match them. What's the reason of this problem? How do I fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美人如玉 2024-12-19 07:45:31

您确定这与“特殊 iso-8859-1 字符”而不是换行符有关吗? . 默认情况下不匹配行终止符。您可以使用 DOTALL 标志也可以启用行终止符的匹配。例如:

Pattern patter = Pattern.compile("<p class=\"qt\">(.*?)</p>", Pattern.DOTALL);

Are you sure this has to do with "special iso-8859-1 characters" and not newlines? . does not match line terminators by default. You can use the DOTALL flag to enable matching of line terminators as well. eg:

Pattern patter = Pattern.compile("<p class=\"qt\">(.*?)</p>", Pattern.DOTALL);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文