当前位置：文江博客话题详情

基于或逻辑CSV文件的bash-grep结果

发布于 2025-01-21 17:14:42 字数 1644 浏览 5 评论 0 原文

我有这个CSV文件，基本上是运动员的记录及其个人信息/奖牌。

我只需要使用一个EGREP（扩展的正则表达式）以下（我几乎有所有内容）：

必须有9位，第三位必须为0或3。
ID 一个月仅10月（10）。
运动员的高度必须等于或大于1,7（我在这里挣扎）。第二个小数不能为0。
它必须至少赢得一枚奖牌（无论是金或银，无论多少，但至少一个），但不能被铜牌。

到目前为止，我还有一切，除了高度的事情需要最后一分钟的变化才能始终是正确的（因为我不知道该怎么说可以是1米，但在7-9之间，但与此同时，接受2米，在0 -9）。奖牌，我不知道该如何告诉系统，如果黄金大于0，则可以是0，而相反，

\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,[1|2].[7-9][^0],\d\d,.*[0-9],[1-9],[0].*

这使我返回：

353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,

但是它应该返回此（我基本上已经从演示的银至金位置）：

353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

文件存储在此处：

您可以使用此站点更快地调试代码和文件：

https://regex101.com/

非常感谢，

原文

I have this CSV file which basically is the records from athletes, and their personal info/medals.

I need to get with only one egrep (extended regular expression) the following (I have almost everything):

ID has to have 9 digits and the third has to be either 0 or 3.
The birthday year has to be lower than 2000 and the month only october (10).
The height of the athlete has to be equal or greater than 1,7 (I'm struggling here). The second decimal cannot be 0.
It has to have won at least a medal (either gold or silver, no matter how many, but at least one), but cannot be bronze.

So far I have everything but the height thing needs some last minute change to be always true (because I don't know how to say that can be 1 meter and between 7-9 but at the same time, accept 2 meters and between 0-9).
The medals, I don't know how to tell the system that if gold is greater than 0 silver can be 0 and the other way around...

\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,[1|2].[7-9][^0],\d\d,.*[0-9],[1-9],[0].*

Which returns me this:

353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,

But it should return this (I have basically alterned the 1 from silver to gold position for demo):

353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

The file is stored here:

https://github.com/jpiedehierroa/files/blob/main/athletesv2.txt

You can use this site to debug quicker the code and the file:

https://regex101.com/

Many thanks,

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扬花落满肩 2025-01-28 17:14:42

样本输入：

$ cat medals.dat
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

999946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
999956660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
999972998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-08-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.65,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,0,0,3,

注意： 1st 6线来自OP的预期输出；最后6行是相同行的修改副本；最后6行不应在输出中显示

一个 egrep/regex ixue：

$ egrep '^[0-9]{2}[03][0-9]{6},([^,]*,){3}1...-10[^,]*,(1\.[7-9]|2\.[0-9])[0-9]*,([^,]*,){2}([^0]|[^,]*,[^0])' medals.dat
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

注意：

我的版本 egrep 似乎不支持 \ d 因此，使用 [0-9]
最高的人（到目前为止）是27.2万人，因此我们应该对 2 \。 0-9] （即，不需要 [23] \。[0-9] ）
假设感兴趣的字段都没有领先的白色空间

Sample input:

$ cat medals.dat
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

999946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
999956660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
999972998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-08-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.65,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,0,0,3,

NOTE: 1st 6 lines are from OP's expected output; last 6 lines are modified copies of the same lines; the last 6 lines should not show up in the output

One egrep/regex idea:

$ egrep '^[0-9]{2}[03][0-9]{6},([^,]*,){3}1...-10[^,]*,(1\.[7-9]|2\.[0-9])[0-9]*,([^,]*,){2}([^0]|[^,]*,[^0])' medals.dat
353946547,Arthur van Doren,BEL,male,1994-10-01,1.78,74,hockey,0,1,0,
820456660,Giulia Emmolo,ITA,female,1991-10-16,1.71,67,aquatics,0,1,0,
230772998,Kelly Brazier,NZL,female,1989-10-28,1.71,70,rugby sevens,0,1,0,
713017392,Pavlo Tymoshchenko,UKR,male,1986-10-13,1.92,78,modern pentathlon,0,1,0,
110156979,Lauritz Schoof,GER,male,1990-10-07,1.95,98,rowing,1,0,0,
730877927,Matthew Centrowitz,USA,male,1989-10-18,1.76,65,athletics,1,0,0,

NOTES:

my version of egrep doesn't appear to support \d hence the use of [0-9]
tallest man to ever live (so far) was 2.72m so we should be good with 2\.[0-9] (ie, no need for [23]\.[0-9])
assumes none of the fields of interest have leading white space

回复收藏 0 原文

小猫一只 2025-01-28 17:14:42

我 think 这个正则是您要问的：

\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,(1\.[7-9]|2\.[0-9])[^0],\d\d,.*(1,1|0,1|1,0),[0-9],$

I think this regex does what you're asking:

\d\d[0|3]\d\d\d\d\d\d,.*[1]\d\d\d[-][1][0][-]\d\d,(1\.[7-9]|2\.[0-9])[^0],\d\d,.*(1,1|0,1|1,0),[0-9],$

回复收藏 0 原文

~没有更多了~

关于作者

比忠

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

基于或逻辑CSV文件的bash-grep结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

基于或逻辑CSV文件的bash-grep结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。