有没有比这更好的方法来清理文本输入中的超链接?
我正在尝试从给定文本中删除任何超链接,并覆盖任何看似超链接的片段,将其替换为文本 [LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]
public String overwriteLinks(String text){
final String OVERWRITE_WITH = "[LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]";
List<String> checkForPatterns = Arrays.asList(
"http://", "www", ".com", ".net",
".org", "dot com", "dot net");
StringBuilder re = new StringBuilder();
for (String checkForPattern : checkForPatterns){
if (re.length() > 0)
re.append("|");
String quotedSite = Pattern.quote(checkForPattern);
re.append(quotedSite);
}
Pattern p = Pattern.compile(re.toString(),Pattern.CASE_INSENSITIVE);
text = p.matcher(text).replaceAll(OVERWRITE_WITH);
return text;
}
有没有更好的方法这样做可以最大限度地增加被删除的链接数量?我的正则表达式技能充其量是不确定的。
I'm trying to remove any hyperlinks from given text and overwrite any fragments that appear to be hyperlinks, replacing them with the text [LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]
public String overwriteLinks(String text){
final String OVERWRITE_WITH = "[LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]";
List<String> checkForPatterns = Arrays.asList(
"http://", "www", ".com", ".net",
".org", "dot com", "dot net");
StringBuilder re = new StringBuilder();
for (String checkForPattern : checkForPatterns){
if (re.length() > 0)
re.append("|");
String quotedSite = Pattern.quote(checkForPattern);
re.append(quotedSite);
}
Pattern p = Pattern.compile(re.toString(),Pattern.CASE_INSENSITIVE);
text = p.matcher(text).replaceAll(OVERWRITE_WITH);
return text;
}
Is there a better way to do this to maximize the number of links that are removed? My regex skills are iffy at best.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试使用这个正则表达式:
我似乎记得几年前从 URI RFC 在我们的代码库中导入/创建了该正则表达式。这应该与字符串中的所有 URI 匹配,并且可以轻松替换。
Try using this regex:
I seem to remember importing/creating that one in our codebase from the URI RFC years ago. That should match all URIs in your string and allow easy replacement.
您可以按照 Lars Vogel 在他的 Java Regex 教程(“6.4.构建链接检查器”),然后使用一种方法来增强它,以替换用 OVERWRITE_WITH 字符串找到的任何链接。
您必须调整 Lars 提供的示例来满足您的特定需求,但随后您将拥有一个链接处理类,您可以根据需要在应用程序的其他部分中使用它。
You could create a class following the example given by Lars Vogel in his Java Regex Tutorial ("6.4. Building a link checker") and then enhance it with a method to replace any of the links found with your OVERWRITE_WITH String.
You would have to tweak the example that Lars provides for your particular needs but then you would have a link processing class that you could use in other parts of your application as needed.