正则表达式匹配某个字符之前的任何内容?

发布于 2024-07-23 12:26:37 字数 362 浏览 4 评论 0原文

我必须从文本中解析一堆统计数据,它们都被格式化为数字。

例如这一段:

纽约市总计 81.8% 城市 3 至 8 年级的学生是 达到或超过年级数学水平 标准,相比之下,88.9% 该州其他地区的学生。

我只想匹配 81 和 88 数字,而不是后面的“.8”和“.9”。

我怎样才能做到这一点? 我听说过“回溯引用”或“前瞻”之类的术语。 这些会有帮助吗?

我正在使用 C#。

编辑: 在上面的例子中,我需要得到“3”和“8”。 这只是一个简单的例子,但我需要几乎所有的数字。

I have to parse a bunch of stats from text, and they all are formatted as numbers.

For example, this paragraph:

A total of 81.8 percent of New York
City students in grades 3 to 8 are
meeting or exceeding grade-level math
standards, compared to 88.9 percent of
students in the rest of the State.

I want to match just the 81 and 88 numbers, not the ".8" and ".9" that follow.

How can I do this? I've heard the term back-reference or look-aheads or something. Will any of that help?

I am using C#.

Edit:
It's required that I get the "3" and the "8" in the above example. It's just a simple example, but I need pretty much all numbers.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

月隐月明月朦胧 2024-07-30 12:26:37
/[^.](\d+)[^.]/

如下所述,只需使用 MatchObj.Groups(1) 即可获取数字。

/[^.](\d+)[^.]/

As stated below just use MatchObj.Groups(1) to get the digit.

濫情▎り 2024-07-30 12:26:37

如果你不想处理组,你可以像你说的那样使用前瞻; 此模式查找字符串中所有十进制数的整数部分:

Regex integers = new Regex(@"\d+(?=\.\d)");
MatchCollection matches = integers.Matches(str);

matches 将包含 8188。 如果您想匹配任何数字(无论是否为小数)的整数部分,您可以搜索不以 开头的整数。

Regex integers = new Regex(@"(?<!\.)\d+");

这次,匹配项将包含 813888

If you don't want to deal with groups, you can use a lookahead like you say; this pattern finds the integer part of all decimal numbers in the string:

Regex integers = new Regex(@"\d+(?=\.\d)");
MatchCollection matches = integers.Matches(str);

matches will contain 81 and 88. If you'd like to match the integer part of ANY numbers (decimal or not), you can instead search for integers that don't start with a .:

Regex integers = new Regex(@"(?<!\.)\d+");

This time, matches would contain 81, 3, 8 and 88.

极度宠爱 2024-07-30 12:26:37

完整的 C# 解决方案:

/// <summary>
/// Use of named backrefence 'roundedDigit' and word boundary '\b' for ease of
/// understanding
/// Adds the rounded percents to the roundedPercents list
/// Will work for any percent value
/// Will work for any number of percent values in the string
/// Will also give those numbers that are not in percentage (decimal) format
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRoundedPercents(string digitSequence, out List<string> roundedPercents)
{
    roundedPercents = null;
    string pattern = @"(?<roundedDigit>\b\d{1,3})(\.\d{1,2}){0,1}\b";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        roundedPercents = new List<string>();
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.ExplicitCapture);

        for (Match m = r.Match(digitSequence); m.Success; m = m.NextMatch())
            roundedPercents.Add(m.Groups["roundedDigit"].Value);

        return true;
    }
    else
        return false;
}

从您的示例中返回 81、3、8 和 88

Complete C# solution:

/// <summary>
/// Use of named backrefence 'roundedDigit' and word boundary '\b' for ease of
/// understanding
/// Adds the rounded percents to the roundedPercents list
/// Will work for any percent value
/// Will work for any number of percent values in the string
/// Will also give those numbers that are not in percentage (decimal) format
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRoundedPercents(string digitSequence, out List<string> roundedPercents)
{
    roundedPercents = null;
    string pattern = @"(?<roundedDigit>\b\d{1,3})(\.\d{1,2}){0,1}\b";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        roundedPercents = new List<string>();
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.ExplicitCapture);

        for (Match m = r.Match(digitSequence); m.Success; m = m.NextMatch())
            roundedPercents.Add(m.Groups["roundedDigit"].Value);

        return true;
    }
    else
        return false;
}

From your example returns 81, 3, 8 and 88

怕倦 2024-07-30 12:26:37
[^.](\d+)

从您的示例中,这将匹配“ 81”,“ 3”,“ 8”,“ 88”

在获得号码之前您将获得一个额外的字符,但您可以在代码中将其删除。

[^.](\d+)

From your example, this will match " 81", " 3", " 8", " 88"

You'll get an extra character before you get your number, but you can just trim that out in your code.

小巷里的女流氓 2024-07-30 12:26:37

尝试:

[0-9]*(?=[3])

它使用前瞻来仅匹配后跟小数点的数字。

C# 代码:

Regex regex = new Regex("[0-9]+(?=[.])");
MatchCollection matches = regex.Matches(input);

Try:

[0-9]*(?=[3])

It uses a lookahead to match only numbers followed by a decimal point.

C# Code:

Regex regex = new Regex("[0-9]+(?=[.])");
MatchCollection matches = regex.Matches(input);
假情假意假温柔 2024-07-30 12:26:37
/(\d+)\.\d/g

这将匹配任何后面有小数点的数字(我认为这就是你想要的),但只会捕获小数点之前的数字。 \d 只会捕获数字(与 [0-9] 相同),因此这使这变得非常简单。

编辑:如果您还想要三和八,您甚至不需要检查小数。

Edit2:抱歉,已修复它,因此它将忽略所有小数位。

/(\d+)(?:\.\d+)?/g
/(\d+)\.\d/g

This will match any number that has a decimal following it (which I think is what you want), but will only capture the numbers before the decimal. \d will only capture numbers (same as [0-9]), so it makes this pretty simple.

Edit: If you want the three and the eight as well, you don't even need to check for the decimal.

Edit2: Sorry, fixed it so it will ignore all the decimal places.

/(\d+)(?:\.\d+)?/g
笙痞 2024-07-30 12:26:37

尝试使用
/(\d+)((\.\d+)?)/

这基本上意味着将一个数字序列和一个可选的小数点与另一个数字序列相匹配。 然后,使用 MatchObj.Groups(1) 作为第一个匹配值,忽略第二个匹配值。

Try using
/(\d+)((\.\d+)?)/

This basically means match a sequence of digits and an optional decimal point with another sequence of digits. Then, use MatchObj.Groups(1) for the first match value, ignoring the second one.

白色秋天 2024-07-30 12:26:37

这不是您所询问的语言,但它可能会帮助您思考问题。

$ echo "A total of 81.8 percent of New York City students in grades 3 to 8 are meeting or exceeding grade-level math standards, compared to 88.9 percent of students in the rest of the State." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
81 3 8 88

第一个 fmt 命令要求以下命令单独考虑每个单词。 “sed -n”命令仅输出那些至少以一个数字开头的单词。 第二个 sed 命令删除单词中的第一个非数字字符及其后的所有字符。 第二个 fmt 命令将所有内容重新组合到一行中。

$ echo "This tests notation like 6.022e+23 and 10e100 and 1e+100." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
6 10 1

This is not in the language you asked about, but it may help you think about the problem.

$ echo "A total of 81.8 percent of New York City students in grades 3 to 8 are meeting or exceeding grade-level math standards, compared to 88.9 percent of students in the rest of the State." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
81 3 8 88

The first fmt command asks the following commands to consider each word separately. The "sed -n" command outputs only those words which start with at least one number. The second sed command removes the first non-digit character in the word, and everything after. The second fmt command combines everything back into one line.

$ echo "This tests notation like 6.022e+23 and 10e100 and 1e+100." \
| fmt -w 1 | sed -n -e '/^[0-9]/p' | sed -e 's,[^0-9].*,,' | fmt -w 72
6 10 1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文