正则表达式按空格分割，保留大括号中的字符串

发布于 2024-12-05 11:01:16 字数 333 浏览 0 评论 0原文

我有一个看起来像这样的字符串

arg1 {0 1} arg2 {5 87} string {with space} ar3 1

它被空格分割，但字符串也可能包含空格，因此会导致问题对于带有空格的字符串。我仍然需要分割这个字符串，但我不想分割包含在大括号中并以 string 关键字为前缀的字符串。这意味着上面的字符串应该像这样分割

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

不能实现这个，我真的需要阅读很多关于正则表达式的内容。请你帮助我好吗？

原文

I have a string that looks like that

arg1 {0 1} arg2 {5 87} string {with space} ar3 1

It is split by space, but string may contain spaces as well, so it causes problems for strings with spaces. I still need to split this string, but I'd like to do not split string contained in curl braces and prefixed by string keyword. That means that the string above should be split like that

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

Can't implement this, I really need to read a lot about regular expressions. Could you please help me?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心清如水 2024-12-12 11:01:16

步骤1：照常用空格分割，得到一个数组

步骤2：遍历数组，如果找到{[a-zA-Z]+，则用空格连接下一个元素，并删除下一个元素。

然后你就得到了你想要的。以下 awk 命令作为示例显示。

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1"|awk '{split($0,a); 
for(i=1;i<=length(a);i++){
  if(a[i]~/{[a-zA-Z]+/){a[i]=a[i]" "a[i+1];delete a[i+1];} 
  if(a[i])print a[i];} }'

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

==更新==

好的，根据您的评论，这也有效：

步骤1，找出您不想“拆分”的那些字符串，用特殊字符串替换。重要的是将找到的字符串保存到另一个数组中。 grep 示例中的模式：

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|grep -E -o '\{([a-zA-Z]+\s*)*\}'

        {with space}
        {abc def}
        {xyz zyx}

after replace:xxxxxxxxx as the special string

kent$  echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|sed -r 's#\{([a-zA-Z]+\s*)*\}#xxxxxxxxx#g'

arg1 {0 1} arg2 {5 87} string xxxxxxxxx ar3 1 xxxxxxxxx xxxxxxxxx

步骤2，拆分

步骤3，用正确的索引替换特殊字符串。

step 1:split with space as usual, get an array

step 2: go through the array, if find {[a-zA-Z]+, join the next element with a space, and remove the next element.

then you got what you want. the following awk command shows as an example.

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1"|awk '{split($0,a); 
for(i=1;i<=length(a);i++){
  if(a[i]~/{[a-zA-Z]+/){a[i]=a[i]" "a[i+1];delete a[i+1];} 
  if(a[i])print a[i];} }'

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

==update==

OK, based on your comment, this works too:

step1, find out those strings that you don't want to "split", replace with a special string. and important is saving found strings to another array. The pattern in grep example:

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|grep -E -o '\{([a-zA-Z]+\s*)*\}'

        {with space}
        {abc def}
        {xyz zyx}

after replace:xxxxxxxxx as the special string

kent$  echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|sed -r 's#\{([a-zA-Z]+\s*)*\}#xxxxxxxxx#g'

arg1 {0 1} arg2 {5 87} string xxxxxxxxx ar3 1 xxxxxxxxx xxxxxxxxx

step2, do split

step3, replace the special string back with right index.

回复收藏 0 原文

江南烟雨〆相思醉 2024-12-12 11:01:16

我不了解 QRegExp，所以我不知道它是否具有 lookaround 功能。如果是这样，您可以尝试按如下方式进行拆分：

(?<!(^|})[^{]*\bstring\s{[^}]*)\s

应该在任何空白字符上进行拆分，除了紧接在单词 string 之前的一对大括号内的空白字符。如果 string 关键字已经位于一组大括号内，它将忽略它。

您还可以使用简化版本： (?，尽管这会受到诸如 foo {string { 之类的奇怪内容的影响酒吧qux}}。

I don't know QRegExp, so I don't know if it has lookaround capabilities. If it does, you could try splitting on something like this:

(?<!(^|})[^{]*\bstring\s{[^}]*)\s

That should split on any whitespace character except those inside a pair of braces immediately preceded by the word string. It will ignore the string keyword if it's already inside a set of braces.

You can also use a simplified version: (?<!\bstring\s{[^}]*)\s, although this will be affected by weird stuff like foo {string {bar qux}}.

回复收藏 0 原文

~没有更多了~