在solaris上获取子字符串的索引

发布于 2024-07-30 01:24:02 字数 39 浏览 8 评论 0原文

如何在Solaris10上找到与正则表达式匹配的子字符串的索引?

How can I find the index of a substring which matches a regular expression on solaris10?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

暗藏城府 2024-08-06 01:24:03

您将问题标记为 bash,所以我假设您询问如何在 bash 脚本中执行此操作。 不幸的是,内置的正则表达式匹配不保存字符串索引。 但是,如果您提出这个问题是为了提取匹配子字符串,那么您很幸运:

if [[ "$var" =~ "$regex" ]]; then
     n=${#BASH_REMATCH[*]}
     while [[ $i -lt $n ]]
     do
         echo "capture[$i]: ${BASH_REMATCH[$i]}"
        let i++
     done
fi

此代码片段将依次输出所有子匹配。 第一个(索引 0)将是整个比赛。

不过,您可能更喜欢 awk 选项。 有一个函数 match 可以为您提供所需的索引。 可以在此处找到文档。 如果您需要的话,它还会将匹配的长度存储在 RLENGTH 中。 要在 bash 脚本中实现此功能,您可以执行以下操作:

match_index=$(echo "$var_to_search" | \
awk '{
    where = match($0, '"$regex_to_find"')
    if (where)
        print where
    else
        print -1
}')

有很多方法可以处理将变量传递给 awk。 这种管道输出和直接将其嵌入到 awk one-liner 中的组合相当常见。 您还可以使用 -v 选项指定 awk 变量值(请参阅 man awk)。

显然你可以修改它来获取长度、匹配字符串,无论你需要什么。 如果需要,您可以将多个内容捕获到数组变量中:

match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))

You tagged the question as bash, so I'm going to assume you're asking how to do this in a bash script. Unfortunately, the built-in regular expression matching doesn't save string indices. However, if you're asking this in order to extract the match substring, you're in luck:

if [[ "$var" =~ "$regex" ]]; then
     n=${#BASH_REMATCH[*]}
     while [[ $i -lt $n ]]
     do
         echo "capture[$i]: ${BASH_REMATCH[$i]}"
        let i++
     done
fi

This snippet will output in turn all of the submatches. The first one (index 0) will be the entire match.

You might like your awk options better, though. There's a function match which gives you the index you want. Documentation can be found here. It'll also store the length of the match in RLENGTH, if you need that. To implement this in a bash script, you could do something like:

match_index=$(echo "$var_to_search" | \
awk '{
    where = match($0, '"$regex_to_find"')
    if (where)
        print where
    else
        print -1
}')

There are a lot of ways to deal with passing the variables in to awk. This combination of piping output and directly embedding one into the awk one-liner is fairly common. You can also give awk variable values with the -v option (see man awk).

Obviously you can modify this to get the length, the match string, whatever it is you need. You can capture multiple things into an array variable if necessary:

match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))
烧了回忆取暖 2024-08-06 01:24:03

我的 goto 选项是 bash、awk 和 perl。 我不确定你想做什么,但这三个中的任何一个都可能效果很好。 例如:

f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string

The goto options for me are bash, awk and perl. I'm not sure what you're trying to do, but any of the three would likely work well. For example:

f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string
说不完的你爱 2024-08-06 01:24:03

如果您使用 bash 4.x,您可以获取 oobash。 使用 oo 风格的 bash 编写的字符串库:

http://sourceforge.net/projects/oobash/

String 是构造函数:

字符串 abcda

a.indexOf a

0

a.lastIndexOf a

4

a.indexOf da

3

还有许多“方法”可以在脚本中处理字符串:

-base64Decode      -base64Encode  -capitalize        -center            
-charAt            -concat        -contains          -count             
-endsWith          -equals        -equalsIgnoreCase  -reverse           
-hashCode          -indexOf       -isAlnum           -isAlpha           
-isAscii           -isDigit       -isEmpty           -isHexDigit        
-isLowerCase       -isSpace       -isPrintable       -isUpperCase       
-isVisible         -lastIndexOf   -length            -matches           
-replaceAll        -replaceFirst  -startsWith        -substring         
-swapCase          -toLowerCase   -toString          -toUpperCase       
-trim              -zfill

If you use bash 4.x you can source the oobash. A string lib written in bash with oo-style:

http://sourceforge.net/projects/oobash/

String is the constructor function:

String a abcda

a.indexOf a

0

a.lastIndexOf a

4

a.indexOf da

3

There are many "methods" more to work with strings in your scripts:

-base64Decode      -base64Encode  -capitalize        -center            
-charAt            -concat        -contains          -count             
-endsWith          -equals        -equalsIgnoreCase  -reverse           
-hashCode          -indexOf       -isAlnum           -isAlpha           
-isAscii           -isDigit       -isEmpty           -isHexDigit        
-isLowerCase       -isSpace       -isPrintable       -isUpperCase       
-isVisible         -lastIndexOf   -length            -matches           
-replaceAll        -replaceFirst  -startsWith        -substring         
-swapCase          -toLowerCase   -toString          -toUpperCase       
-trim              -zfill
吐个泡泡 2024-08-06 01:24:02

假设您想要使用 bash 查找字符串中通配符第一个匹配的位置,则以下 bash 函数将返回该位置,如果通配符不匹配,则返回空:

function match_index()
{
  local pattern=$1
  local string=$2  
  local result=${string/${pattern}*/}

  [ ${#result} = ${#string} ] || echo ${#result}
}

例如:

$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10

如果您想允许 full -blown 正则表达式而不仅仅是通配符,将“local result=”行替换为,

local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')

但随后您会遇到常见的 shell 引用问题。

Assuming that what you want is to find the location of the first match of a wildcard in a string using bash, the following bash function returns just that, or empty if the wildcard doesn't match:

function match_index()
{
  local pattern=$1
  local string=$2  
  local result=${string/${pattern}*/}

  [ ${#result} = ${#string} ] || echo ${#result}
}

For example:

$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10

If you want to allow full-blown regular expressions instead of just wildcards, replace the "local result=" line with

local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')

but then you're exposed to the usual shell quoting issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文