从字符串中提取电影名称和年份是可选的

发布于 2024-10-25 08:53:50 字数 352 浏览 3 评论 0原文

我在这里遗漏了一件非常明显的事情，但我是正则表达式的新手，所以请友善;-)

我有许多任意格式的电影，可能附有也可能没有附有年份。

My Movie Name 2010
Some.Other.Super.Cool.Movie
The~Third|Movie.2010

现在，使用 (.+)\W(\d{4}) 我可以将带有日期的两部电影提取到两组，一组包含名称，另一组包含年份，但中间的组会被忽略？我只是有点不确定如何真正使年份部分成为可选的。

理想情况下，;-)，我可以使用单个表达式返回名称，并将 \W 转换为空格，但这是一个不同的对话。

提前致谢

原文

I'm missing a really obvious thing here, but I'm new to regex so be kind ;-)

I have a number of films in an arbitrary format that may or may not have the year attached.

My Movie Name 2010
Some.Other.Super.Cool.Movie
The~Third|Movie.2010

Now, using (.+)\W(\d{4}) I can extract the two movies with dates into two groups one containing the name and the other the year, but the middle one gets ignored? I'm just a little unsure on how to actually make the year segment optional.

Ideally, ;-), I could use a single expression to return the names with \W converted into spaces but that a different conversation.

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城泪 2024-11-01 08:53:50

使用 ?在 a 字符组之后将使其成为可选，因此在您的情况下 (\d{4}) 之后，

(.+)\W(\d{4})?

这是因为您在 (.+) 上使用贪婪匹配，并且 \W 在其设置中包含新行字符（我认为它至少）。去掉字符串中的尾随空格，如果这不起作用，则使用 ? 使 (.+) 变得懒惰。它自己的 (.+?) - 还要考虑 \W 可能是这个问题的错误分隔符。

另外，在末尾添加 $ 可能会有所帮助，因为这需要数字来结束函数（如果可以的话），请尝试延迟匹配和 $。

(.+?)\W(\d{4})?$

using a ? after the a character group will make it optional so in your case after the (\d{4})

(.+)\W(\d{4})?

That is because you are using greedy matching on (.+) and \W includes the new line character in it's set ( I think it does at least ). Strip your string of trailing whitespace and if that doesn't work make (.+) lazy with a ? of it's own, (.+?) - Also consider that \W may be the wrong delimiter for this problem.

Also adding $ to the end may help, as that would require the digits to end the function is they can, try lazing matching and $.