正则表达式在模式匹配时返回唯一行

发布于 2024-09-24 04:36:58 字数 713 浏览 9 评论 0原文

我正在解析日志文件并尝试匹配错误语句。我匹配“错误 CS”的行部分将应用于许多行,有些重复,有些不重复。有什么方法可以让我不返回重复项吗?使用 Java 风格的 RegEx..

示例:我的简单正则表达式返回

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context
Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

希望它返回:

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

I am parsing a log file and trying to match error statements. The part of the line I am matching "error CS" will apply to numerous lines some duplicates some not. Is there a way I can not return the duplicates. Using Java flavor of RegEx..

example: my simple regex returns

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context
Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

would like it to return:

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

猫瑾少女 2024-10-01 04:36:58

一种解决方案是使用正则表达式进行匹配,然后将该行放入数据结构中,例如 set 负责为您删除重复项。解析结束时只需打印该集合的内容。

如果您担心顺序,您可以添加到某种类型的地图,以行作为键,行号作为值(也许在插入之前检查匹配的条目)。如果按值排序,您将获得给定行的第一个实例的列表。

One solution is to match using your regexp and then put the line into a data structure like a set which deals with removing duplicates for you. At the end of parsing just print the contents of the set.

If you're concerned about order you could add to a map of some kind with the line as the key and the line number as the value (perhaps checking for a matching entry before inserting). If you sort by value you'll get a list of the first instance of a given line.

若有似无的小暗淡 2024-10-01 04:36:58

从技术上讲,使用正则表达式这是不可能的。你需要更强大的东西。

正则表达式用于匹配正则语言。您尝试匹配的模式不规则。

您需要表达式记住一些“状态”、先前匹配的错误,而正则表达式并不适合处理此类计算。 图灵机能够保存状态。这更符合您的需要。 (Java 就很适合这个要求。)

在找到所有错误行后,通过在日志解析器中添加一些额外的逻辑可以很容易地解决这个问题。

Technically speaking, with a regular expression, this is not possible. You need something more powerful.

Regular expressions are meant for matching regular languages. The pattern you are attempting to match is not regular.

You require the expression to remember some 'state', the previously matched errors, and regular expressions are not meant to handle this type of computation. A Turing Machine is capable of saving state. This is more along the lines of what you need. (Java will fit the bill nicely.)

This could be fairly easily solved by adding some extra logic into your log parser after you find all of the error lines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文