我有这个CSV文件,基本上是运动员的记录及其个人信息/奖牌。
我只需要使用一个EGREP(扩展的正则表达式)以下(我几乎有所有内容):
- 必须有9位,第三位必须为0或3。
- ID 一个月仅10月(10)。
- 运动员的高度必须等于或大于1,7(我在这里挣扎)。第二个小数不能为0。
- 它必须至少赢得一枚奖牌(无论是金或银,无论多少,但至少一个),但不能被铜牌。
到目前为止,我还有一切,除了高度的事情需要最后一分钟的变化才能始终是正确的(因为我不知道该怎么说可以是1米,但在7-9之间,但与此同时,接受2米,在0 -9)。
奖牌,我不知道该如何告诉系统,如果黄金大于0,则可以是0,而相反,
\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,[1|2].[7-9][^0],\d\d,.*[0-9],[1-9],[0].*
这使我返回:
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
但是它应该返回此(我基本上已经从演示的银至金位置):
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,
文件存储在此处:
您可以使用此站点更快地调试代码和文件:
https://regex101.com/
非常感谢,
I have this CSV file which basically is the records from athletes, and their personal info/medals.
I need to get with only one egrep (extended regular expression) the following (I have almost everything):
- ID has to have 9 digits and the third has to be either 0 or 3.
- The birthday year has to be lower than 2000 and the month only october (10).
- The height of the athlete has to be equal or greater than 1,7 (I'm struggling here). The second decimal cannot be 0.
- It has to have won at least a medal (either gold or silver, no matter how many, but at least one), but cannot be bronze.
So far I have everything but the height thing needs some last minute change to be always true (because I don't know how to say that can be 1 meter and between 7-9 but at the same time, accept 2 meters and between 0-9).
The medals, I don't know how to tell the system that if gold is greater than 0 silver can be 0 and the other way around...
\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,[1|2].[7-9][^0],\d\d,.*[0-9],[1-9],[0].*
Which returns me this:
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
But it should return this (I have basically alterned the 1 from silver to gold position for demo):
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,
The file is stored here:
https://github.com/jpiedehierroa/files/blob/main/athletesv2.txt
You can use this site to debug quicker the code and the file:
https://regex101.com/
Many thanks,
发布评论
评论(2)
样本输入:
注意: 1st 6线来自OP的预期输出;最后6行是相同行的修改副本;最后6行不应在输出中显示
一个
egrep/regex
ixue:注意:
egrep
似乎不支持\ d
因此,使用[0-9]
2 \。 0-9]
(即,不需要[23] \。[0-9]
)Sample input:
NOTE: 1st 6 lines are from OP's expected output; last 6 lines are modified copies of the same lines; the last 6 lines should not show up in the output
One
egrep/regex
idea:NOTES:
egrep
doesn't appear to support\d
hence the use of[0-9]
2\.[0-9]
(ie, no need for[23]\.[0-9]
)我 think 这个正则是您要问的:
I think this regex does what you're asking: