用于获取方括号内数据的正则表达式模式

发布于 2025-01-11 02:45:18 字数 1392 浏览 0 评论 0原文

我的正则表达式模式:

^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(?<source>\[.*\])? (?<message>.*)

Test_String_1 = "Oct 25 14:24:29.700799 I [System] Connected"

根据我的正则表达式模式输出:

Match groups:
timestamp   Oct 25 14:24:29.700799
levelname   I
source      [System]
message     Connected

Test_String_2 = "Oct 25 14:24:30.315344 E ERR: [[Signal]] Valid Shared Mem!"

按照我的正则表达式模式输出:

Match groups:
timestamp   Oct 25 14:24:30.315344
levelname   E
source      [[Signal]]
message     Valid Shared Mem!

但是,我期望 Test_String_1 和 Test_String_2 得到以下结果:

Test_String_1:

Match groups:
timestamp   Oct 25 14:24:29.700799
levelname   I
source      System    
message     Connected

Test_String_2:

Match groups:
timestamp   Oct 25 14:24:30.315344
levelname   E
source      Signal
message     Valid Shared Mem!

我应该在正则表达式模式中进行哪些更改才能获得预期结果。我使用 https://rubular.com/ 进行正则表达式测试。

[编辑]: Test_String_3 = "10 月 25 日 14:24:29.653900 D 连接被拒绝"

预期输出:

Match groups:
timestamp   Oct 25 14:24:29.653900
levelname   D
source  
message     Connection refused

My Regex pattern:

^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(?<source>\[.*\])? (?<message>.*)

Test_String_1 = "Oct 25 14:24:29.700799 I [System] Connected"

Output as per my regex pattern:

Match groups:
timestamp   Oct 25 14:24:29.700799
levelname   I
source      [System]
message     Connected

Test_String_2 = "Oct 25 14:24:30.315344 E ERR: [[Signal]] Valid Shared Mem!"

Output as per my regex pattern:

Match groups:
timestamp   Oct 25 14:24:30.315344
levelname   E
source      [[Signal]]
message     Valid Shared Mem!

However I am expecting the below results for Test_String_1 and Test_String_2:

Test_String_1:

Match groups:
timestamp   Oct 25 14:24:29.700799
levelname   I
source      System    
message     Connected

Test_String_2:

Match groups:
timestamp   Oct 25 14:24:30.315344
levelname   E
source      Signal
message     Valid Shared Mem!

What changes should I made in my regex pattern to get the expected result. I'm using https://rubular.com/ for regex testing.

[Edit]:
Test_String_3 = "Oct 25 14:24:29.653900 D Connection refused"

Expected output:

Match groups:
timestamp   Oct 25 14:24:29.653900
levelname   D
source  
message     Connection refused

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

终难遇 2025-01-18 02:45:18

您可以匹配以下正则表达式。

^(?P<timestamp>[JFMASOND][a-z]{2} [0123]\d [012]\d(?::[0-5]\d){2}\.\d{6}\b) (?P<levelname>[A-Z]) +(?:[A-Z]+: +)?(?:\[+(?P<source>[A-Za-z]+)\]+)? *(?P<message>.+)

演示

请注意,我已将捕获组 source 设为可选。

根据要求,可能需要进行一些调整。例如,我假设 source 捕获组将包含一个单词,并且 levelnamesource 之间没有空格(或消息),它由一个或多个大写字母后跟一个冒号组成,如第二个示例 ('ERR:')。我还假设了必须指定的时间戳格式的严格程度以及哪些捕获组应为可选。这些当然只是对规范的猜测,因为问题中没有详细说明。

正则表达式可以分解如下。请注意,我在字符类 ([ ]) 中放置了单独的空格,只是为了使它们对读者可见。我已经使用 Python 对此进行了测试(命名字符类的编写方式为 (?P....),但它也可以在 Ruby 中工作。

^                     # match beginning of string
(?P<timestamp>        # begin 'timestamp' capture group
  [JFMASOND]          # match a cap letter in the char class
  [a-z]{2}            # match two lowercase letters
  [ ]                 # match a space
  [0123]\d            # match a digit in the char class then any digit
  [ ]                 # match a space
  [012]\d             # match a digit in the char class then any digit
  (?:                 # begin a non-capture group
    :                 # match a colon
    [0-5]\d           # match a digit in the char class then any digit 
  ){2}                # end non-capture group and execute it twice
  \.                  # match a period
  \d{6}               # match 6 digits
  \b                  # match a word boundary
)                     # end timestamp capture group
(?P<levelname>        # begin 'levelname' capture group
  [A-Z])[ ]+          # match a capital letter then >= 1 spaces
)                     # end 'levelname' capture group 
(?:[A-Z]+:[ ]+)?      # optionally match >= 1 capital letters
                      # then >= 1 spaces
(?:                   # begin non-capture group
  \[+                 # match one or more left brackets
  (?P<source>         # begin capture group 'source'
    [A-Za-z]+         # match >= 1 chars in char class
  )                   # end capture group 'source'
  \]+                 # match one or more right brackets
)?                    # end non-capture group and make optional
[ ]*                  # match >= 0 spaces
(?P<message>.+)       # match rest of line and save to capture
                      # group 'message'

You can match the following regular expression.

^(?P<timestamp>[JFMASOND][a-z]{2} [0123]\d [012]\d(?::[0-5]\d){2}\.\d{6}\b) (?P<levelname>[A-Z]) +(?:[A-Z]+: +)?(?:\[+(?P<source>[A-Za-z]+)\]+)? *(?P<message>.+)

Demo

Notice that I've made the capture group source optional.

Depending on requirements some adjustments may need to be made. I assumed, for example, that the source capture group would contain a single word and if there were non-spaces between the levelname and source (or message) it would be comprised of one or more capital letters followed by a colon, as in the second example ('ERR:'). I've also made assumptions about how rigorous the timestamp format must be specified and which capture groups should be made optional. These were of course just guesses about the specification as they were not spelled out in the question.

The regular expression can be broken down as follows. Note that I have put individual spaces in character classes ([ ]) merely to make them visible to the reader. I've tested this with Python (for which named character classes are written (?P<name>....), but it would work in Ruby as well.

^                     # match beginning of string
(?P<timestamp>        # begin 'timestamp' capture group
  [JFMASOND]          # match a cap letter in the char class
  [a-z]{2}            # match two lowercase letters
  [ ]                 # match a space
  [0123]\d            # match a digit in the char class then any digit
  [ ]                 # match a space
  [012]\d             # match a digit in the char class then any digit
  (?:                 # begin a non-capture group
    :                 # match a colon
    [0-5]\d           # match a digit in the char class then any digit 
  ){2}                # end non-capture group and execute it twice
  \.                  # match a period
  \d{6}               # match 6 digits
  \b                  # match a word boundary
)                     # end timestamp capture group
(?P<levelname>        # begin 'levelname' capture group
  [A-Z])[ ]+          # match a capital letter then >= 1 spaces
)                     # end 'levelname' capture group 
(?:[A-Z]+:[ ]+)?      # optionally match >= 1 capital letters
                      # then >= 1 spaces
(?:                   # begin non-capture group
  \[+                 # match one or more left brackets
  (?P<source>         # begin capture group 'source'
    [A-Za-z]+         # match >= 1 chars in char class
  )                   # end capture group 'source'
  \]+                 # match one or more right brackets
)?                    # end non-capture group and make optional
[ ]*                  # match >= 0 spaces
(?P<message>.+)       # match rest of line and save to capture
                      # group 'message'
清音悠歌 2025-01-18 02:45:18

我认为这就是您想要的

^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(\[*(?<source>\w*)\]*) (?<message>.*)

括号应该包含 source 字段的内容。

I think this is what you want

^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(\[*(?<source>\w*)\]*) (?<message>.*)

brackets should wrap the source field.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文