有没有比这更好的方法来清理文本输入中的超链接?

发布于 2024-12-27 21:15:45 字数 876 浏览 2 评论 0原文

我正在尝试从给定文本中删除任何超链接,并覆盖任何看似超链接的片段,将其替换为文本 [LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]

public String overwriteLinks(String text){
    final String OVERWRITE_WITH = "[LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]";

    List<String> checkForPatterns = Arrays.asList(
        "http://", "www", ".com", ".net", 
        ".org", "dot com", "dot net");

    StringBuilder re = new StringBuilder();
    for (String checkForPattern : checkForPatterns){
        if (re.length() > 0)
            re.append("|");
        String quotedSite = Pattern.quote(checkForPattern);
        re.append(quotedSite);
    }

    Pattern p = Pattern.compile(re.toString(),Pattern.CASE_INSENSITIVE);
    text = p.matcher(text).replaceAll(OVERWRITE_WITH);

    return text;
}

有没有更好的方法这样做可以最大限度地增加被删除的链接数量?我的正则表达式技能充其量是不确定的。

I'm trying to remove any hyperlinks from given text and overwrite any fragments that appear to be hyperlinks, replacing them with the text [LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]

public String overwriteLinks(String text){
    final String OVERWRITE_WITH = "[LINK REMOVED FROM EVIDENCE AT REQUEST OF TRIAL JUDGE]";

    List<String> checkForPatterns = Arrays.asList(
        "http://", "www", ".com", ".net", 
        ".org", "dot com", "dot net");

    StringBuilder re = new StringBuilder();
    for (String checkForPattern : checkForPatterns){
        if (re.length() > 0)
            re.append("|");
        String quotedSite = Pattern.quote(checkForPattern);
        re.append(quotedSite);
    }

    Pattern p = Pattern.compile(re.toString(),Pattern.CASE_INSENSITIVE);
    text = p.matcher(text).replaceAll(OVERWRITE_WITH);

    return text;
}

Is there a better way to do this to maximize the number of links that are removed? My regex skills are iffy at best.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

情场扛把子 2025-01-03 21:15:45

尝试使用这个正则表达式:

public static final URI_REGEX = Pattern.compile( "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?" );

我似乎记得几年前从 URI RFC 在我们的代码库中导入/创建了该正则表达式。这应该与字符串中的所有 URI 匹配,并且可以轻松替换。

Try using this regex:

public static final URI_REGEX = Pattern.compile( "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?" );

I seem to remember importing/creating that one in our codebase from the URI RFC years ago. That should match all URIs in your string and allow easy replacement.

不气馁 2025-01-03 21:15:45

您可以按照 Lars Vogel 在他的 Java Regex 教程(“6.4.构建链接检查器”),然后使用一种方法来增强它,以替换用 OVERWRITE_WITH 字符串找到的任何链接。

您必须调整 Lars 提供的示例来满足您的特定需求,但随后您将拥有一个链接处理类,您可以根据需要在应用程序的其他部分中使用它。

You could create a class following the example given by Lars Vogel in his Java Regex Tutorial ("6.4. Building a link checker") and then enhance it with a method to replace any of the links found with your OVERWRITE_WITH String.

You would have to tweak the example that Lars provides for your particular needs but then you would have a link processing class that you could use in other parts of your application as needed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文