当前位置：文江博客话题详情

在括号中添加线路休息时间

发布于 2025-01-17 08:32:28 字数 728 浏览 2 评论 0 原文

我正在尝试清除Web刮擦中的一些数据。

这是我正在使用的信息的一个示例：

Best Time
Adam Jones (w/ help) (6:34)Best Time
Kenny Gobbin (a) (2:38)Personal Best
Matt Herrera (12:44)No-record
Nick Elizabeth (19:04)

这是我要实现的示例：

Best Time
Adam Jones (w/ help) (6:34)

Best Time
Kenny Gobbin (2:38)

Personal Best
Matt Herrera (12:44)

No-record
Nick Elizabeth (19:04)

我想在每个正确的括号之后添加两条新行，但是随着时间的不同，我不喜欢，我不喜欢t知道我如何搜索和更换它。另外，数字有时可能发生在时代以外。

我最接近的是用结肠搜索括号内的数字将它们分开，但是我不知道如何用相同的信息替换它们。

re.sub(r"\([0-9]+:[0-9]+\)", "\n\n", result)

有人知道我如何实现这一目标吗？

原文

I'm trying to clean up some data from web scraping.

This is an example of the information I'm working with:

Best Time
Adam Jones (w/ help) (6:34)Best Time
Kenny Gobbin (a) (2:38)Personal Best
Matt Herrera (12:44)No-record
Nick Elizabeth (19:04)

And this is an example of what I'm trying to achieve:

Best Time
Adam Jones (w/ help) (6:34)

Best Time
Kenny Gobbin (2:38)

Personal Best
Matt Herrera (12:44)

No-record
Nick Elizabeth (19:04)

I want to add two new lines after each right parentheses, but as the times are all different, I don't know how I can search and replace it. Also, numbers may sometimes occur outside of the times.

The closest I've come is by searching for numbers inside the parentheses with a colon to separate them, but I don't know how to replace that with the same information.

re.sub(r"\([0-9]+:[0-9]+\)", "\n\n", result)

Does anyone know how I can achieve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

写下不归期 2025-01-24 08:32:29

请注意，需要插入两个换行符的位置位于结束括号和字母字符之间。因此，您可以使用：

re.sub(r"\)([A-Za-z])", r")\n\n\1", data)

例如：

import re
data = """Best Time
Adam Jones (w/ help) (6:34)Best Time
Kenny Gobbin (a) (2:38)Personal Best
Matt Herrera (12:44)No-record
Nick Elizabeth (19:04)"""

result = re.sub(r"\)([A-Za-z])", r")\n\n\1", data)
print(result)

输出：

Best Time
Adam Jones (w/ help) (6:34)

Best Time
Kenny Gobbin (a) (2:38)

Personal Best
Matt Herrera (12:44)

No-record
Nick Elizabeth (19:04)

这是其工作原理的解释：

对于我们尝试匹配的表达式，我们有 r"\)([A-Za-z])":

\) 匹配文字结束括号。
[A-Za-z] 匹配单个字母字符。
将 [A-Za-z] 括在括号中使其成为我们稍后引用的捕获组。

对于替换表达式，我们有 r")\n\n\1"：

)\n\n 添加一个结束括号和两行新行。
\1 指的是之前的捕获组。直观上，我们立即捕获末尾括号后的字母字符，然后将相同的字符添加回替换表达式中。

Notice that the place where you need to insert two newlines comes between an end parenthesis and an alphabetic character. So, you can use:

re.sub(r"\)([A-Za-z])", r")\n\n\1", data)

For example:

import re
data = """Best Time
Adam Jones (w/ help) (6:34)Best Time
Kenny Gobbin (a) (2:38)Personal Best
Matt Herrera (12:44)No-record
Nick Elizabeth (19:04)"""

result = re.sub(r"\)([A-Za-z])", r")\n\n\1", data)
print(result)

outputs:

Best Time
Adam Jones (w/ help) (6:34)

Best Time
Kenny Gobbin (a) (2:38)

Personal Best
Matt Herrera (12:44)

No-record
Nick Elizabeth (19:04)

Here's an explanation for how it works:

For the expression we're trying to match, we have r"\)([A-Za-z])":

\) matches a literal end parenthesis.
[A-Za-z] matches a single alphabetic character.
Enclosing [A-Za-z] in parentheses makes it a capture group that we refer to later.

For the replacement expression, we have r")\n\n\1":

)\n\n adds an end parenthesis plus two new lines.
\1 refers to the capture group from earlier. Intuitively, we capture the alphabetic character immediately after the end parenthesis, and then add that same character back into the replacement expression.

回复收藏 0 原文

稍尽春風 2025-01-24 08:32:28

您可以通过最小的变化来做到这一点。您只需要了解分组并添加 \ g＆lt; 0＆gt; 权利 \ n \ n 。您可以在有关。

re.sub(r"\([0-9]+:[0-9]+\)", "\g<0>\n\n", result)

在这里，我使用了组0（（）中的匹配）再次插入它。每组（）是一个组，从左至右侧计数为0。

You can do it your way with a minimal change. You only have to know about grouping and add \g<0> right befor \n\n. You can read about it in the offical documentation in the section about search-and-replace.

re.sub(r"\([0-9]+:[0-9]+\)", "\g<0>\n\n", result)

Here I used group 0 (the match in ()) to insert it again. Each set of () is a group, counted from the left to the right started with 0.

回复收藏 0 原文

~没有更多了~

关于作者

笔落惊风雨

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

在括号中添加线路休息时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

在括号中添加线路休息时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。