从字符串中获取重复的格式化段
我正在开发一个论坛橄榄球风格的得分游戏,并寻求帮助开发正则表达式解析器来解析游戏集。
每个帖子可能具有以下可能的格式(不同之处在于有些人可能使用逗号来分隔游戏,有些人可能会使用连字符连接分数 - 或两者的任意组合):
球队 25-31 队 团队 28-35 团队 球队 38-10 球队 团队 21-15 团队
。
团队 25 31 团队 团队 28 35 团队 团队 38 10 团队 团队 21 15 团队
。
团队 25-31 团队, 团队 28-35 团队, 团队 38-10 团队, 团队 21-15 团队
。
团队 25 31 团队, 团队 28 35 团队, 团队 38 10 团队, 团队 21 15 团队
基本上,球队的长度总是为 5 个字符,比分介于两队之间,但单个帖子中的比赛数量不一定总是相同,即一个帖子可能是一场比赛或 20 场比赛之前或之后也可能有额外的文本,但仍然需要能够提取游戏。只需要将每场比赛分开即可,即[TEAMA] [SCORE] [SCORE] [TEAMB] 将被视为一场比赛。
我开始使用爆炸,但没有太多运气,不幸的是没有太多正则表达式经验,因此寻找一种灵活的方式来适应上述情况 - 只需要拆分每个游戏即可。
I'm developing a forum rugby style score game and looking for help developing a regex parser to parse the sets of games.
Each post could have the possible below formats (difference is some people may use a comma to break up games and also some may hyphenate the score - or any combination of the two):
TEAMA 25-31 TEAMB
TEAMC 28-35 TEAMD
TEAME 38-10 TEAMF
TEAMG 21-15 TEAMH
.
TEAMA 25 31 TEAMB
TEAMC 28 35 TEAMD
TEAME 38 10 TEAMF
TEAMG 21 15 TEAMH
.
TEAMA 25-31 TEAMB,
TEAMC 28-35 TEAMD,
TEAME 38-10 TEAMF,
TEAMG 21-15 TEAMH
.
TEAMA 25 31 TEAMB,
TEAMC 28 35 TEAMD,
TEAME 38 10 TEAMF,
TEAMG 21 15 TEAMH
Basically the teams are always expected to be 5 characters long and the score sat in between the two teams but there may not necessarily always be the same amount of games in an individual post, i.e. one post could be one game or 20. There could also be extra text before or after but still need to be able to pluck out the games. Just need each game to be split out i.e. [TEAMA] [SCORE] [SCORE] [TEAMB] would be considered one game.
I started to use explode but didn't have much luck and unfortunately don't have much regex experience so looking for a flexible way to accommodate the above - just need each game to be split out.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
匹配每个结果比拆分它们更容易,例如:
为您提供每个结果,例如:
It's easier to match each result than to split them, e.g.:
Gives you for each result, something like:
您可以尝试这样的正则表达式(假设团队名称是字母数字)
http://rubular.com/r/v4HGNzo3UY< /a>
You could try a regular expression like this (assumes team names are alphanumeric)
http://rubular.com/r/v4HGNzo3UY
另一种选择,
输出:
)
an alternative,
Output:
)
要严格验证 5 个单词的字符串,请在游戏段的外边缘使用单词边界 (
\b
)。要匹配两个分数之间的未知非数字分隔符,请使用\D+
匹配一个或多个非数字。不需要任何捕获组,只需将每个游戏作为全字符串匹配进行匹配并访问引用数组中的这些元素即可。
代码:(Demo)
输出:
如果你想将第一个游戏数据解析为数组,你可以使用 < code>sscanf() 生成字符串和整数数组。 (演示)
输出:
或声明单个变量:(演示)
输出:
To tightly validate the 5-word-character strings, use word boundaries (
\b
) on the outside edges of the game segment. To match the unknown non-numeric delimiter between the two scores, use\D+
to match one or more non-digits.There is no need for any capture groups, just match each game as a fullstring match and access those elements in the reference array.
Code: (Demo)
Output:
If you wanted to parse the first game data into an array, you could use
sscanf()
to generate an array of strings and integers. (Demo)Output:
Or declare individual variables: (Demo)
Output: