返回匹配但不完全相同的字符串

发布于 2025-01-25 01:26:15 字数 227 浏览 1 评论 0原文

是否有任何方法可以找到一个包含给定字符串的单词,但不是确切的匹配。例如,

# cat t.txt
first line
ind is a shortform of india

我试图返回“印度”一词,因为它包含字符串“ ind”,但我不需要确切的匹配。我尝试了...

# grep -o 'ind' t.txt
ind
ind

Is there any way to find a word that contains a given string but is not the exact match. For e.g.

# cat t.txt
first line
ind is a shortform of india

I am trying to return the word "india" because it contains the string "ind" but I do not need the exact match. I have tried this...

# grep -o 'ind' t.txt
ind
ind

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

疏忽 2025-02-01 01:26:15

您能否尝试以下操作:

grep -Eo '[A-Za-z]+ind|ind[A-Za-z]+' t.txt

输出:

india

REGEX [A-ZA-Z]+IND | IND | IND [A-ZA-Z]+匹配Ind包括上述或以下字母。

Would you please try the following:

grep -Eo '[A-Za-z]+ind|ind[A-Za-z]+' t.txt

Output:

india

The regex [A-Za-z]+ind|ind[A-Za-z]+ matches ind including the preceding or following alphabets.

风吹短裙飘 2025-02-01 01:26:15
$ grep -Eo '[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' file
india
fooindbar

以上是在此输入文件上运行的(请注意出现在字符串中间而不是启动或结束的ind> ind的添加测试用例):

$ cat file
first line
ind is a shortform of india
this fooindbar is the mid-word text

您可以使用GNU awk(用于MULTII -char rs,rt和\ s for [:SPACE:]])如果您喜欢:

$ awk -v RS='\\s+' '/[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+/' file
india
fooindbar

或::

$ awk -v RS='[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' 'RT{print RT}' file
india
fooindbar
$ grep -Eo '[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' file
india
fooindbar

the above was run on this input file (note the added test case of ind appearing in the middle of a string instead of just the start or end):

$ cat file
first line
ind is a shortform of india
this fooindbar is the mid-word text

You can do the same with GNU awk (for multi-char RS, RT, and \s shorthand for [[:space:]]) if you prefer:

$ awk -v RS='\\s+' '/[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+/' file
india
fooindbar

or:

$ awk -v RS='[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' 'RT{print RT}' file
india
fooindbar
丘比特射中我 2025-02-01 01:26:15

我将使用gnu awk进行此任务以下方式,让file.txt content in

first line
ind is a shortform of india

OUTPUT

awk 'BEGIN{RS="[[:space:]]+"}match($0,/ind/)&&length>RLENGTH{print}' file.txt

OUTPONS

india

说明:我告知GNU awk该行分隔仪( rs)是一个或多个空格,这样,每个单词都将被视为行。然后,对于每一行(即每个单词),我使用匹配 函数返回1(如果找到else 0),并设置rstartrllength值。如果找到匹配,我会检查当前行的长度是否大于匹配的,如果是,则i print sate Word。 在自己的行中输出

india ind india ind india

单词都

india
india
india

请注意,每个

I would use GNU AWK for this task following way, let file.txt content be

first line
ind is a shortform of india

then

awk 'BEGIN{RS="[[:space:]]+"}match($0,/ind/)&&length>RLENGTH{print}' file.txt

output

india

Explanation: I inform GNU AWK that row separator (RS) is one or more whitespaces, this way every word will be treated as row. Then for every row (that is every word) I use match function which return 1 if found else 0 and set RSTART and RLENGTH values. If match is found I check if length of current row (that is word) is greater than that of match, if it is so I print said word. Note that every word is outputted at own line so for example if input file content would be

india ind india ind india

then output would be

india
india
india

(tested in gawk 4.2.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文