处理搜索字符串中的元字符

发布于 2024-08-19 07:23:10 字数 534 浏览 9 评论 0原文

我有一个用户输入,将在可能包含元字符的搜索字符串中使用,

对于例如 C# 或 C++,

我在函数中的 grep 命令是:

grep -E "$1|$2" test.txt

在直接替换下:

grep -E "C\+\+|testWord" test.txt
grep -E "C\#|testWord" test.txt

第一个捕获行很好,但不是第二个。 奇怪的是,# 被完全忽略了。 如果没有直接替换,两者都会用 c 后跟 testWord 而不是 c++ 和 c# 分别捕获任何内容

我尝试使用 sed 处理它

$temp = `echo $1 | sed 's/[\#\!\&\;\`\"\'\|\*\?\~\<\>\^\(\)\[\]\{\}\$\+\\]/\\&/g'`

,但它无法正常工作。 或者还有其他方法来处理带有元字符的用户输入吗?

提前致谢

I have a user input that would be used in a search string that may contain a metacharacter

For e.g. C# or C++

my grep command in a function was:

grep -E "$1|$2" test.txt

under direct replacement:

grep -E "C\+\+|testWord" test.txt
grep -E "C\#|testWord" test.txt

the first caught the lines fine but not the second.
Strangely, # was completely ignored.
Without direct replacement, both catch anything with c followed by testWord instead of c++ and c# respectively

I've tried handling it using sed

$temp = `echo $1 | sed 's/[\#\!\&\;\`\"\'\|\*\?\~\<\>\^\(\)\[\]\{\}\$\+\\]/\\&/g'`

but it doesn't work right.
Or is there any other way to handle user input with metacharacters?

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夜雨飘雪 2024-08-26 07:23:10

如果您将输入作为参数传递给脚本

#!/bin/bash

input1="$1"
input2="$2"
while read -r line
do
    case "$line" in
        *$input1*|*$input2* ) echo "found: $line";;
    esac
done  <"BooksDB.txt

输出

$ cat file
this is  a line
this line has C++ and C#
this line has only C++ and that's it
this line has only C# and that's it
this is end line Caa

$ ./shell.sh C++ C#
found: this line has C++ and C#
found: this line has only C++ and that's it
found: this line has only C# and that's it

如果您从 read 获取输入

read -p "Enter input1:" input1
read -p "Enter input2:" input2
while read -r line
do
    case "$line" in
        *$input1|*$input2* ) echo "found: $line";;
    esac
done <"BooksDB.txt"

if you are passing the input as arguments to the script

#!/bin/bash

input1="$1"
input2="$2"
while read -r line
do
    case "$line" in
        *$input1*|*$input2* ) echo "found: $line";;
    esac
done  <"BooksDB.txt

"

output

$ cat file
this is  a line
this line has C++ and C#
this line has only C++ and that's it
this line has only C# and that's it
this is end line Caa

$ ./shell.sh C++ C#
found: this line has C++ and C#
found: this line has only C++ and that's it
found: this line has only C# and that's it

if you are getting input from read

read -p "Enter input1:" input1
read -p "Enter input2:" input2
while read -r line
do
    case "$line" in
        *$input1|*$input2* ) echo "found: $line";;
    esac
done <"BooksDB.txt"
帅的被狗咬 2024-08-26 07:23:10

这对我有用:

$ testfun1(){ echo "foo $1" | grep "$1"; }
$ testfun1 C#
foo C#
$ testfun2(){ read a; echo "bar $a" | grep "$a"; }
$ testfun2
C#
bar C#

编辑:

您可以在不使用-E的情况下尝试此表单:

$ testfun3(){ grep "$1\|$2" test.txt; }
$ testfun3 C++ awk
something about C++
blah awk blah
$ testfun3 C# sed
blah sed blah
the text containing C#
$ testfun3 C# C++
something about C++
the text containing C#

This works for me:

$ testfun1(){ echo "foo $1" | grep "$1"; }
$ testfun1 C#
foo C#
$ testfun2(){ read a; echo "bar $a" | grep "$a"; }
$ testfun2
C#
bar C#

Edit:

You might try this form without -E:

$ testfun3(){ grep "$1\|$2" test.txt; }
$ testfun3 C++ awk
something about C++
blah awk blah
$ testfun3 C# sed
blah sed blah
the text containing C#
$ testfun3 C# C++
something about C++
the text containing C#
何以笙箫默 2024-08-26 07:23:10

只需引用 $1 和 $2 中的所有 grep 元字符,然后将它们添加到 grep 表达式中即可。

像这样的事情:

quoted1=`echo "$1" | sed -e 's/\([]\.?^${}+*[]\)/\\\\\1/g'`
quoted2=`echo "$2" | sed -e 's/\([]\.?^${}+*[]\)/\\\\\1/g'`
grep -E "$quoted1\|$quoted2" test.txt

应该有效。调整元字符列表以适应。处理|有点棘手,因为反斜杠使它变得特别,但由于我们已经反斜杠反斜杠我认为它是安全的。

Just quote all the grep metacharacters in $1 and $2 before adding them to your grep expression.

Something like this:

quoted1=`echo "$1" | sed -e 's/\([]\.?^${}+*[]\)/\\\\\1/g'`
quoted2=`echo "$2" | sed -e 's/\([]\.?^${}+*[]\)/\\\\\1/g'`
grep -E "$quoted1\|$quoted2" test.txt

ought to work. Adjust the metachar list to suit. Handling | is a little tricky because backslashing makes it special, but since we're already backslashing backslashes I think it's safe.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文