正则表达式匹配某个字符之前的任何内容?
我必须从文本中解析一堆统计数据,它们都被格式化为数字。
例如这一段:
纽约市总计 81.8% 城市 3 至 8 年级的学生是 达到或超过年级数学水平 标准,相比之下,88.9% 该州其他地区的学生。
我只想匹配 81 和 88 数字,而不是后面的“.8”和“.9”。
我怎样才能做到这一点? 我听说过“回溯引用”或“前瞻”之类的术语。 这些会有帮助吗?
我正在使用 C#。
编辑: 在上面的例子中,我需要得到“3”和“8”。 这只是一个简单的例子,但我需要几乎所有的数字。
I have to parse a bunch of stats from text, and they all are formatted as numbers.
For example, this paragraph:
A total of 81.8 percent of New York
City students in grades 3 to 8 are
meeting or exceeding grade-level math
standards, compared to 88.9 percent of
students in the rest of the State.
I want to match just the 81 and 88 numbers, not the ".8" and ".9" that follow.
How can I do this? I've heard the term back-reference or look-aheads or something. Will any of that help?
I am using C#.
Edit:
It's required that I get the "3" and the "8" in the above example. It's just a simple example, but I need pretty much all numbers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
如下所述,只需使用 MatchObj.Groups(1) 即可获取数字。
As stated below just use MatchObj.Groups(1) to get the digit.
如果你不想处理组,你可以像你说的那样使用前瞻; 此模式查找字符串中所有十进制数的整数部分:
matches
将包含81
和88
。 如果您想匹配任何数字(无论是否为小数)的整数部分,您可以搜索不以开头的整数。
:这次,匹配项将包含
81
、3
、8
和88
。If you don't want to deal with groups, you can use a lookahead like you say; this pattern finds the integer part of all decimal numbers in the string:
matches
will contain81
and88
. If you'd like to match the integer part of ANY numbers (decimal or not), you can instead search for integers that don't start with a.
:This time, matches would contain
81
,3
,8
and88
.完整的 C# 解决方案:
从您的示例中返回 81、3、8 和 88
Complete C# solution:
From your example returns 81, 3, 8 and 88
从您的示例中,这将匹配“ 81”,“ 3”,“ 8”,“ 88”
在获得号码之前您将获得一个额外的字符,但您可以在代码中将其删除。
From your example, this will match " 81", " 3", " 8", " 88"
You'll get an extra character before you get your number, but you can just trim that out in your code.
尝试:
它使用前瞻来仅匹配后跟小数点的数字。
C# 代码:
Try:
It uses a lookahead to match only numbers followed by a decimal point.
C# Code:
这将匹配任何后面有小数点的数字(我认为这就是你想要的),但只会捕获小数点之前的数字。
\d
只会捕获数字(与 [0-9] 相同),因此这使这变得非常简单。编辑:如果您还想要三和八,您甚至不需要检查小数。
Edit2:抱歉,已修复它,因此它将忽略所有小数位。
This will match any number that has a decimal following it (which I think is what you want), but will only capture the numbers before the decimal.
\d
will only capture numbers (same as [0-9]), so it makes this pretty simple.Edit: If you want the three and the eight as well, you don't even need to check for the decimal.
Edit2: Sorry, fixed it so it will ignore all the decimal places.
尝试使用
/(\d+)((\.\d+)?)/
这基本上意味着将一个数字序列和一个可选的小数点与另一个数字序列相匹配。 然后,使用
MatchObj.Groups(1)
作为第一个匹配值,忽略第二个匹配值。Try using
/(\d+)((\.\d+)?)/
This basically means match a sequence of digits and an optional decimal point with another sequence of digits. Then, use
MatchObj.Groups(1)
for the first match value, ignoring the second one.这不是您所询问的语言,但它可能会帮助您思考问题。
第一个 fmt 命令要求以下命令单独考虑每个单词。 “sed -n”命令仅输出那些至少以一个数字开头的单词。 第二个 sed 命令删除单词中的第一个非数字字符及其后的所有字符。 第二个 fmt 命令将所有内容重新组合到一行中。
This is not in the language you asked about, but it may help you think about the problem.
The first fmt command asks the following commands to consider each word separately. The "sed -n" command outputs only those words which start with at least one number. The second sed command removes the first non-digit character in the word, and everything after. The second fmt command combines everything back into one line.