拆分琴弦而无需删除分离器

发布于 2025-02-12 21:00:26 字数 287 浏览 2 评论 0 原文

我有以下文本,

text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"

我希望输出是:

12345678 abcdefg
37394822 gdzdnhqihdzuiew 
09089799 
78998728 gdjewdwq

我尝试了“ re.split(“ \ d {8}”,text)”,但结果不正确。 如何获得正确的输出?

I have the following text,

text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"

And I want the output be:

12345678 abcdefg
37394822 gdzdnhqihdzuiew 
09089799 
78998728 gdjewdwq

I tried "re.split("\d{8}", text)", but the result is incorrect.
How to get the correct output?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烦人精 2025-02-19 21:00:26

您可以使用“ lookahead”

regex tutorial- lookahead- lookbehind and lookbehind Zere Lengthens Zere Length Spertions

import re
text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"
arr = re.split(r"\s+(?=\d)", text)
print(arr)

You can use "Lookahead"

Regex Tutorial - Lookahead and Lookbehind Zero-Length Assertions

import re
text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"
arr = re.split(r"\s+(?=\d)", text)
print(arr)
给我一枪 2025-02-19 21:00:26

iiuc,您希望将数字部分与字母数字和数字配对,始终是每行的第一条问题,

而不是解决方案的优雅,而是解决问题

splitted_txt = txt.split(' ')
i=0
while (i < (len(splitted_txt))):
    if (splitted_txt[i].isdigit() & ~(splitted_txt[i+1].isdigit())  ):
        print(splitted_txt[i], splitted_txt[i+1] )
        i+=1
    else:
        print(splitted_txt[i])
    i+=1
12345678 abcdefg
37394822 gdzdnhqihdzuiew
09089799
78998728 gdjewdwq

IIUC, you looking to pair the numeric part with the alphanumeric and numeric will always be the first on each line

not an elegant of solution but addresses the question

splitted_txt = txt.split(' ')
i=0
while (i < (len(splitted_txt))):
    if (splitted_txt[i].isdigit() & ~(splitted_txt[i+1].isdigit())  ):
        print(splitted_txt[i], splitted_txt[i+1] )
        i+=1
    else:
        print(splitted_txt[i])
    i+=1
12345678 abcdefg
37394822 gdzdnhqihdzuiew
09089799
78998728 gdjewdwq
凑诗 2025-02-19 21:00:26

我更喜欢 @itagaki的答案,但值得注意的是, findall 也可以使用:

import re
text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"
re.findall(r"\d+(?:\s+[a-z]+)?", text)
  #=> ['12345678 abcdefg', '37394822 gdzdnhqihdzuiew', '09089799', '78998728 gdjewdwq']

demo

正则表达式可以分解如下。

\d+       # match one or more digits
(?:       # begin a non-capture group
  \s+     # match one or more whitespaces
  [a-z]+  # match one or more lowercase letters
)         # end non-capture group
?         # make non-capture group optional

如果需要完全有8位数字,并且字符串小写字母的长度在(例如)7和15之间(如示例),则将其正则稍微修改:

r"\d{8}(?:\s+[a-z]{7,15})?"

I prefer @Itagaki's answer but it's worth noting that findall could also be used:

import re
text = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"
re.findall(r"\d+(?:\s+[a-z]+)?", text)
  #=> ['12345678 abcdefg', '37394822 gdzdnhqihdzuiew', '09089799', '78998728 gdjewdwq']

Demo

The regular expression can be broken down as follows.

\d+       # match one or more digits
(?:       # begin a non-capture group
  \s+     # match one or more whitespaces
  [a-z]+  # match one or more lowercase letters
)         # end non-capture group
?         # make non-capture group optional

If it were required that there be exactly 8 digits and that the strings lowercase letters have lengths between (say) 7 and 15 (as in the example), the regex would be modified slightly:

r"\d{8}(?:\s+[a-z]{7,15})?"
握住你手 2025-02-19 21:00:26

如果要匹配8位数字,则可以使用:

\b\d{8}\b.*?(?=\s*(?:\b\d{8}\b|$))

说明

  • \ b \ d {8} \ b 匹配8位被单词边界包围的数字以防止部分匹配
    • \ s*匹配可选的Whitespace Chars
    • (?:\ b \ d {8} \ b | $)匹配8位或断言字符串的结尾
  • (?:\ b )关闭lookahead

Regex Demo | python demo

示例

import re

pattern = r"\b\d{8}\b.*?(?=\s*(?:\b\d{8}\b|$))"
s = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"

print(re.findall(pattern, s))

['12345678 abcdefg', '37394822 gdzdnhqihdzuiew', '09089799', '78998728 gdjewdwq']

If you want to match 8 digits, you can use:

\b\d{8}\b.*?(?=\s*(?:\b\d{8}\b|$))

Explanation

  • \b\d{8}\b Match 8 digits surrounded by word boundaries to prevent partial matches
  • .*? Match any char, as least as possible
  • (?= Positive lookahead
    • \s* Match optional whitespace chars
    • (?:\b\d{8}\b|$) Match either 8 digits or assert the end of the string
  • ) Close lookahead

Regex demo | Python demo

Example

import re

pattern = r"\b\d{8}\b.*?(?=\s*(?:\b\d{8}\b|$))"
s = "12345678 abcdefg 37394822 gdzdnhqihdzuiew 09089799 78998728 gdjewdwq"

print(re.findall(pattern, s))

Output

['12345678 abcdefg', '37394822 gdzdnhqihdzuiew', '09089799', '78998728 gdjewdwq']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文