假设我有一个文件,称为courses.txt,带有如下内容。
该文件有部分(课程提供者和我使用的电子邮件),然后有各种课程。示例:edx(每个都在序列号之前。
udemy ([email protected])
"=========================="-
1) foo bar
2) java programming language
3) redis stephen grider
4) javascript
5) react with typescript
6) kotlin
7) Etherium and Solidity : the Complete Developer's Guide
8) reactive programming with spring
coursera ([email protected])
"==========================-"
1) python
2) typescript
3) java concurrency
4) C#
edX ([email protected])
"==========================-"
1) excel
2) scala
3) risk management
4) stock
5) oracle
6) mysql
7) java
==========================-
问题:我想参加课程,说“爪哇”。我想要一个匹配项,向我展示了比赛的特定行(示例:“ Java”)和相应的部分名称(例如,“ EDX( [email  protiveed] )”。
如果我想搜索“ java”什么“正则”会给我以下匹配项(我在Windows上使用GREP/PERL):
<br>
udemy ([email protected])
2) java programming language
coursera ([email protected])
3) java concurrency
edX ([email protected])
7) java
我尝试了LookBehind/lookahead,但无法弄清楚如何用电子邮件和电子邮件打印课程提供商名称课程名称。
想法?
Let's say I have a file, called courses.txt with contents like below.
the file has sections(course providers and my email used) followed by various courses. example : edX ([email protected]) and then the various course names, each preceded by the serial number.
udemy ([email protected])
"=========================="-
1) foo bar
2) java programming language
3) redis stephen grider
4) javascript
5) react with typescript
6) kotlin
7) Etherium and Solidity : the Complete Developer's Guide
8) reactive programming with spring
coursera ([email protected])
"==========================-"
1) python
2) typescript
3) java concurrency
4) C#
edX ([email protected])
"==========================-"
1) excel
2) scala
3) risk management
4) stock
5) oracle
6) mysql
7) java
==========================-
Question : I want to grep for a course, say "java". I want a match which shows me the particular line(s) of the match(example : "java") and the corresponding section name(say, "edX ([email protected])" ).
if I want to search for "java" what "regex" will give me following matches (I use grep/perl on windows):
<br>
udemy ([email protected])
2) java programming language
coursera ([email protected])
3) java concurrency
edX ([email protected])
7) java
I tried lookbehind/lookahead but couldn't figure out how to print the course provider name with email and the course name.
Thoughts?
发布评论
评论(3)
如果您在段落中进行处理(文本的块由空白行分开),则在每个段落中,匹配所需的模式相当简单 - 标头(其后是
=
's的行)和一个与java
在其中上进行了测试;在Windows上进行测试。
(在Linux /code> s i使用
。+?
,而不是输入中关注=
s的特定字符,因为您的示例输入不一致;它具有- “
和” -
,在不同的段落中。适当调整。由于这是在Windows上,因此您可能必须在其中使用
“
单线分隔符(我不知道您使用的外壳),因此您可能需要替换字面的“
” /code>带有
>)或您的其他喜欢的序列的模式内部。
\ x22
(对Windows有好处(现在无法在Windows上测试)
希望 - 00 switch 使其在段落中读取>/x “ noreferrer”> modifier 模式中的空间被忽略,因此我们可以使用它们来使用它们来
/s
。匹配多行,直到java
(被空间包围)。†如果您不介意拥有脚本而不是单线线,我建议,例如,例如
&lt;&gt;&gt;运算符读取命令行上给出的文件,逐行划分,但是“行”的概念早先设置为带有
local $/=“ \ n \ n \ n”
的段落。 local 是否有这是一部分在一个较大的程序中,您不想更改 $/variable 程序!†或,而不是使用
/s
使。
匹配newlines的/s
,为多行使用模式,或者,如果您需要
>” ...“
在Windows上,就像(再次,我现在无法在Windows上测试。)
请注意,现在我们不必制作所有这些
。添加了
?
(。+?
),如上所述/s
的模式 - 现在。+
在此处需要停在新线上。或者,通过扩展模式< /a>
此处
(?s)
“打开”/s
修饰符,直到封闭组的结束为止这种情况),但是(? - s)
将其关闭。If you process in paragraphs (chunks of text separated by blank lines) then in each paragraph it is fairly straightforward to match the needed pattern -- the header (followed by a line with
=
's) and a line withjava
in it(Tested on Linux; read on for Windows. Broken into lines for easier reading. See below for explanation of the pattern.)
At the end of the line with
=
s I use.+?
instead of the specific characters that follow=
s in your input because your sample input isn't consistent; it has both-"
and"-
, in different paragraphs. Adjust as suitable.Since this is on Windows, where you may have to use
"
delimiters for the one-liner (I don't know what shell you use), you may need to replace the literal"
inside the pattern with\x22
(hex for"
), or your other favorite sequence.Hopefully good for Windows (can't test on Windows right now)
The -00 switch makes it read in paragraphs. With the
/x
modifier spaces inside the pattern are ignored so we can use them to space things out for readability. With the/s
modifier the.
matches a newline as well. This is important for the middle.+?
to matche multiple lines, up to the one withjava
(surrounded by spaces).†If you don't mind having a script instead of a one-liner, what I recommend, then, for example
The <> operator reads files given on the command line, line by line, but the notion of a "line" is earlier set to a paragraph with
local $/ = "\n\n"
. That local is there in case this is a part of a larger program where you don't want to change the $/ variable for the whole program!† Or, instead of using
/s
that makes.
match newlines, use a pattern for multiple linesOr, if you need
"..."
on Windows, like(Again, I can't test on Windows right now.)
Note that now we don't have to make all those
.+
non-greedy with the added?
(.+?
) like in the patterns with/s
above -- now that.+
stops at a newline, just as needed here.Or, use the
/s
modifier dynamically, via extended patternsHere
(?s)
"turns on" the/s
modifier, which would be in effect until the end of the enclosing group (the rest of the pattern in this case), but(?-s)
turns it off.我不会给您一个完整的解决方案,但是您可以从此开始:
一些解释:
-i
使情况不敏感-e
使用扩展的正则表达式,您会得到一个带有所有电子邮件地址和所有“ Java”课程的文件,以及一个捕获:如果带有电子邮件地址的行之后是另一个带有电子邮件地址的行,则没有“ Java”课程为此地址。因此,您现在可以使用Perl并删除下一行也是电子邮件地址的电子邮件地址。
I won't give you a complete solution, but you can start with this:
Some explanation:
-i
makes it case insensitive-E
uses extended regular expressions|
is an example of those extended regular expressions and it means "OR": show the lines which contain 'java' OR '@' (the latter being all the email adresses)As a result, you get a file with all the e-mail addresses, and all the 'java' courses, together with a catch: if a line with an e-mail address is followed by another line with an e-mail address, then there's no 'java' course for that address. Hence, you can now use Perl and remove the e-mail addresses where the next line also is an e-mail address.
查看输入数据,我们可以得出结论,部分以包含电子邮件地址的行开始。
本节的数据以序列号开头。
基于此信息,我们可以使用包含电子邮件为a 键的LINE构建哈希
%
,并且以序列号开头的所有行都可以存储在 array中在钥匙下。一旦哈希构建,代码将贯穿所有部分,并查找包含搜索词的行,如果该术语找到了带有匹配的 line 的输出节。
注意:要使用
&lt; data&gt;
使用&lt;&gt;
以&gt;
运行,然后以./ script.pl filename.dat
输出运行
Looking at the input data we can conclude that section starts with a line which includes email address.
Data for the section starts with serial number.
Based on this information we can build a hash
%sections
with line which includes email as a key, and all lines starting with serial number can be stored in an array under the key.Once the hash is build the code goes through all sections and looks for lines which include search term, if the term found the output section with matching line.
Note: to work on real file replace
<DATA>
with<>
then run as./script.pl filename.dat
Output