正则表达式 - 从书名中提取卷号和章节号

发布于 2024-10-24 03:07:03 字数 370 浏览 4 评论 0原文

嘿，
我正在尝试将一些遗留数据导入到一个全新的系统中，几乎完成了，但是有一个很大的问题！假设这些数据：

Blabla Vol.1 chapter 2
ABCD in the era of XYZ volume 2 First Chapter  
A really useless book Eighth vol  
Blala Sixth Vol Chapter 5  
Lablah V6C7 2002  
FooBar Vol6 C3 by Dr. Foo Bar
Regex: A tool in Hell V1 Eleventh Chapter

困惑！我尝试编写正则表达式来提取卷号和章节号，但你知道它是正则表达式！有人可以指导我完成这个吗？

原文

Hey,
I'm trying to import some legacy data into a brand new system, it's almost done, but there's a huge problem! Assuming these kinda data:

Blabla Vol.1 chapter 2
ABCD in the era of XYZ volume 2 First Chapter  
A really useless book Eighth vol  
Blala Sixth Vol Chapter 5  
Lablah V6C7 2002  
FooBar Vol6 C3 by Dr. Foo Bar
Regex: A tool in Hell V1 Eleventh Chapter

Confused!! I tried to write that regex to extract volume and chapter numbers but you know it's REGEX! Can anyone please guide me through this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

演出会有结束 2024-10-31 03:07:03

这是与您的示例匹配的正则表达式：

/^.+?(?|(?:\bVol.?|\bvolume[ ]+|V)(\d+)|[ ]+([a-z]+)[ ]+vol\b).?(?:(?|(?:C|chapter[ ]+)(\d+)|[ ]+([a-z]+)[ ]+Chapter\b).?)?$/im

您可以在此处实时编辑正则表达式和/或添加测试。

在此链接中：

数组中的元素 [0] 指的是匹配数组

元素 [1] 卷数数组

元素 [2] 章节数组

I assumed that volumes always comes before chapters as stated in your examples.

Here is a regular expression that will match your example :

/^.+?(?|(?:\bVol.?|\bvolume[ ]+|V)(\d+)|[ ]+([a-z]+)[ ]+vol\b).?(?:(?|(?:C|chapter[ ]+)(\d+)|[ ]+([a-z]+)[ ]+Chapter\b).?)?$/im

You can live edit the regex and/or add tests here.

In this link :

element [0] in the array refers to the matches array

element [1] the volumes array

element [2] the chapter array

I assumed that volumes always comes before chapters as stated in your examples.

回复收藏 0 原文

网名女生简单气质 2024-10-31 03:07:03

在我看来，最好将其分成单独的步骤。在第一步中，您可以使用“/Vol.[0-9]+\s+chapter\s[0-9]+$/i”模式转换标题。在第二遍中，您可以转换与模式“/[az]+(th|nd|st)\svol/i”匹配的标题。等等。

尝试编写一个正则表达式来捕获所有这些情况通常不会有好结果，并且几乎总是始终存在错误。这是我前几天发现的一篇有趣的文章，详细介绍了过于复杂的正则表达式的危险。

回复收藏 0 原文

飘落散花 2024-10-31 03:07:03

由于这些表达式根本不是“正则”，因此单个正则表达式将很困难。如果您有一组有限的章节和卷显示“方式”，那么您可以使用多个正则表达式来尝试提取该信息。

或者，如果您可以定义一些规则，例如“章节编号始终采用 [chapter #] 格式”，那么这也会有所帮助！

回复收藏 0 原文

じ违心 2024-10-31 03:07:03

如果同一行上的输出始终是相同的内容，我要做的第一件事就是爆炸（“\ n”，$ data）并使用正确的行。如果一致的话你可以匹配

'/ (.*) Vol Chapter ([0-9]*)/'

什么。

顺便说一句，这个页面总是帮助我进行正则表达式测试。
http://www.quanetic.com/Regex

If the output is always the same things on the same lines the first thing I would do is explode("\n", $data) and work with the correct line. If consistent you could then match for