在Java中匹配不同阿拉伯正则表达式模式后的两个或三个单词

发布于 2024-11-13 23:45:10 字数 1105 浏览 3 评论 0原文

大家好；

我是使用正则表达式的初学者。我想做的是在特定模式后提取 2 或 3 个阿拉伯语单词。

例如：

如果我有一个阿拉伯字符串，

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "

我需要提取后面的名称

الدكتور

，因此

والدكتورة

输出应为：

احمد زويل
سميرة موسى

到目前为止我所做的如下：

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
            Matcher matcher = pattern.matcher(inputtext);
            boolean found = false;
            while (matcher.find()) {
                // Get the matching string
                String match = matcher.group();
                System.out.println("the match is: "+match);
                found = true;
            }
            if (!found)
    {
        System.out.println("I didn't found the text");
    }

但它返回：

احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية

我不知道如何添加另一个模式以及如何2个字后就停下来？

您能帮我出点主意吗？

原文

Greetings All;

I am a beginner in using regex. What I want to do is to extract 2 or 3 arabic words after a certain pattern.

for example:

If I have an arabic string

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "

I need to extract the names after

الدكتور

and

والدكتورة

so the output shall be:

احمد زويل
سميرة موسى

what i have done so far is the following:

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
            Matcher matcher = pattern.matcher(inputtext);
            boolean found = false;
            while (matcher.find()) {
                // Get the matching string
                String match = matcher.group();
                System.out.println("the match is: "+match);
                found = true;
            }
            if (!found)
    {
        System.out.println("I didn't found the text");
    }

but it returns:

احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية

I don't know how to add another pattern and how to stop after 2 words?

Would you please help me with any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亢潮 2024-11-20 23:45:10

要仅匹配以下两个单词，请尝试以下一个：

(?<=الدكتور)\s[^\s]+\s[^\s]+

.* 将匹配字符串末尾之前的所有内容，因此这不是您想要的

\s 是空白字符

< code>[^\s] 是一个否定字符组，它将匹配除空格之外的任何内容

因此我的解决方案将匹配空格，然后至少匹配一个非空格（第一个单词），然后再次匹配空格和一次至少有一个非空白（第二个单词）。

为了匹配您的第二个模式，我只需执行第二个正则表达式（只需交换lookbehind内的部分）并在第二步中匹配该模式。这样正则表达式更容易阅读。

或者你可以尝试这个

(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+

To match only the following two words try this one:

(?<=الدكتور)\s[^\s]+\s[^\s]+

.* will match everything till the end of the string so that is not what you want

\s is a whitespace character

[^\s] is a negated character group, that will match anything but a whitespace

So my solution will match a whitespace, then at least one non whitespace (the first word), then again a whitespace and once more at least one non whitespace (the second word).

To match your second pattern I would just do a second regex (just exchange the part inside the lookbehind) and match this pattern in a second step. The regular expression is easier to read that way.

Or you can try this

(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+

回复收藏 0 原文

~没有更多了~

关于作者

毁梦

暂无简介

文章

590 人气

关注发私信

櫻之舞

文章 0 评论 0

关注

弥枳

文章 0 评论 0

关注

m2429

文章 0 评论 0

关注

寻找一个思念的角度

文章 0 评论 0

关注

野却迷人

文章 0 评论 0

关注

我怀念的。

文章 0 评论 0

友情链接

文江博客

在Java中匹配不同阿拉伯正则表达式模式后的两个或三个单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

在Java中匹配不同阿拉伯正则表达式模式后的两个或三个单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。