PHP 的正则表达式。搜索单词并返回单词后的数据

发布于 2024-11-04 02:30:49 字数 795 浏览 3 评论 0原文

我正在尝试为我被要求做的工作制作一个正则表达式，但我没有运气使它足够高效。
目标是使以下工作尽可能高效。
目标号 1. 使用句尾分隔所有文本（点、3 个点、感叹号...）。
目标数字 2 获取字符串 'em' 之后出现的所有数字
这是一个可能的小字符串及其正则表达式的示例。（真人可真厉害）
正则表达式：旧：
(?:[^.!?:]|...)(?:(?:[^.!?:]|...)*?em (\d+))*< br> 新：
<代码>(?:[.!?]|[.][.][.])(?:(?:[^.!?]|[.][.][.])*?\bem\ b (\d+))*

适用于字符串（我刚刚编的）
（我在开头插入 .）

.回顾 1939 年的战斗。 Claro 是 1939 年的数据。 Em 1938 já（插入 em 1910）não havia reis。

我想要的是制作一个不回溯的正则表达式，因为它根本不需要回溯。通过这样做，我想我可以节省这需要的处理，例如...从 30 秒减少到 20 秒，甚至减少到 10 秒！就为了这个1，需要1s才能完成。
添加：
谢谢你的答案，现在我有了一个不会失败的答案。但它仍然走回头路太多。有什么解决办法吗？

添加（回答一个已删除的问题）：
不幸的是，我没有样本数据，谁要求我这样做，他说他也没有样本数据，这仍然需要“到昨天”完成。如果你给我一些可以尽可能高效地处理本文的东西，我确信我可以使用它并隐蔽地工作，如果需要的话，可以处理特定于这项工作的东西。不然我再来这里问一下。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何时共饮酒 2024-11-11 02:30:49

尽管问题很令人困惑，但听起来您有两个不同的任务，最好使用两个不同的正则表达式来完成。这是一个经过测试的脚本，可以执行您想要的操作（我猜）：

<?php // test.php 20110430_1100
    // Test data.
    $text = 'Foi visto que a batalha em 1939 foi. Claro'.
        ' que a data que digo ser em 1939 é uma farsa. E'.
        'm 1938 já (insert em 1910) não havia reis.';

    // Part 1: Find all numbers after "em".
    $re1 = '/\bem\b\s*(\d+)\b/i';
    $count = preg_match_all($re1, $text, $matches);
    if ($count) $numbers = $matches[1]; // Array of number strings.
    else        $numbers = array();     // Else no numbers found.

    // Part 2: Split text into sentences.
    $re2 = '/(?<=[.!?])\s+/';
    $sentences = preg_split($re2, $text, -1, PREG_SPLIT_NO_EMPTY);

    // Print out results.
    $ncnt = count($numbers); // Count of numbers found.
    printf("There were %d numbers following \"em\".\n", $ncnt);
    for ($i = 0; $i < $ncnt; ++$i) {
        printf("  Number[%d] = %s\n", $i + 1, $numbers[$i]);
    }
    $scnt = count($sentences); // Count of sentences found.
    printf("\nThere were %d sentences found.\n", $scnt);
    for ($i = 0; $i < $scnt; ++$i) {
        printf("  Sentence[%d] = \"%s\"\n", $i + 1, $sentences[$i]);
    }
?>

这是脚本的输出。

“em”后面有 4 个数字。
数字[1] = 1939
数字[2] = 1939
数字[3] = 1938
Number[4] = 1910

找到 3 个句子。
Sentence[1] =“Foi visto que a batalha em 1939 foi。”
Sentence[2] =“Claro que a data que digo ser em 1939 ├⌐ uma farsa。”
Sentence[3] =“Em 1938 j├í（插入 em 1910）n├úo havia reis。”

Although the question is confusing, it sounds like you have two different tasks which is best acomplished with two different regexes. Here is a tested script that does what (I'm guessing) you want:

<?php // test.php 20110430_1100
    // Test data.
    $text = 'Foi visto que a batalha em 1939 foi. Claro'.
        ' que a data que digo ser em 1939 é uma farsa. E'.
        'm 1938 já (insert em 1910) não havia reis.';

    // Part 1: Find all numbers after "em".
    $re1 = '/\bem\b\s*(\d+)\b/i';
    $count = preg_match_all($re1, $text, $matches);
    if ($count) $numbers = $matches[1]; // Array of number strings.
    else        $numbers = array();     // Else no numbers found.

    // Part 2: Split text into sentences.
    $re2 = '/(?<=[.!?])\s+/';
    $sentences = preg_split($re2, $text, -1, PREG_SPLIT_NO_EMPTY);

    // Print out results.
    $ncnt = count($numbers); // Count of numbers found.
    printf("There were %d numbers following \"em\".\n", $ncnt);
    for ($i = 0; $i < $ncnt; ++$i) {
        printf("  Number[%d] = %s\n", $i + 1, $numbers[$i]);
    }
    $scnt = count($sentences); // Count of sentences found.
    printf("\nThere were %d sentences found.\n", $scnt);
    for ($i = 0; $i < $scnt; ++$i) {
        printf("  Sentence[%d] = \"%s\"\n", $i + 1, $sentences[$i]);
    }
?>

Here is the output from the script.

There were 4 numbers following "em".
Number[1] = 1939
Number[2] = 1939
Number[3] = 1938
Number[4] = 1910

There were 3 sentences found.
Sentence[1] = "Foi visto que a batalha em 1939 foi."
Sentence[2] = "Claro que a data que digo ser em 1939 ├⌐ uma farsa."
Sentence[3] = "Em 1938 j├í (insert em 1910) n├úo havia reis."

回复收藏 0 原文