用java提取文本

发布于 2024-09-11 07:36:52 字数 227 浏览 3 评论 0原文

如果我有下面的字符串，如何使用 java 提取 EDITORS PREFACE 文本？谢谢。

<div class='chapter'><a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a></div>

原文

If I have the string below, how can I extract the EDITORS PREFACE text with java? Thanks.

<div class='chapter'><a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a></div>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦冥 2024-09-18 07:36:52

正如您在问题评论中所写的那样，您想要 href 中的内容，请在此处使用正则表达式：

<a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>

此正则表达式将与 Microsoft .NET Framework 一起使用。它将捕获 href 中的内容，并将其放入名为 url 的组中。

刚刚注意到这个问题是用Java标记的。在 Java 中，从 JDK 6 开始没有命名组，因此这里是 Java 的解决方案：

<a[^>]*? href="([^"]+)"[^>]*?>

上面的正则表达式将捕获 href 中的内容，并将其放入组 1。

在这里测试它： http://www.regexplanet.com/simple/index.html

运行此程序：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "<a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a>";
      String pattern = "<a[^>]*? href=\'([^\']+)\'[^>]*?>";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);

      if (m.find( ))
      {
         // Found value: <a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>
         System.out.println("Found value: " + m.group(0) );

         // Found value: page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE
         System.out.println("Found value: " + m.group(1) );
      }
      else
      {
         System.out.println("NO MATCH");
      }
   }
}

As you wrote in a comment of your question that you want what is within href, using Regex here it is:

<a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>

This regex will work with Microsoft .NET Framework. It'll capture the content within href putting it in a group called url.

Just noted that this question is tagged with Java. In Java there's no named group as of JDK 6, so here's the solution for Java:

<a[^>]*? href="([^"]+)"[^>]*?>

The above regex will capture the content within href putting it in group 1.

Test it here: http://www.regexplanet.com/simple/index.html

Run this program:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "<a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a>";
      String pattern = "<a[^>]*? href=\'([^\']+)\'[^>]*?>";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);

      if (m.find( ))
      {
         // Found value: <a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>
         System.out.println("Found value: " + m.group(0) );

         // Found value: page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE
         System.out.println("Found value: " + m.group(1) );
      }
      else
      {
         System.out.println("NO MATCH");
      }
   }
}

回复收藏 0 原文

~没有更多了~