R 正则表达式回顾

发布于 2024-12-26 10:24:35 字数 506 浏览 0 评论 0原文

我有一个填充以下格式字符串的向量：

向量的第一个条目如下所示：

199719982001
199719982002
199719982003
199719982003

对于第一个条目，我们有：year1 = 1997，year2 = 1998，id1 = 2，id2 = 001。

我想编写一个正则表达式来提取year1， id1 和 id2 中不为零的数字。因此，对于第一个条目，正则表达式应输出：199721。

我尝试使用 stringr 包执行此操作，并创建了以下正则表达式：

"^\\d{4}|\\d{1}(?<=\\d{3}$)"

以提取year1和id1，但是当使用lookbehind时，我收到“无效的正则表达式”错误。这让我有点困惑，R 不能处理前瞻和后瞻吗？

原文

I have a vector filled with strings of the following format: <year1><year2><id1><id2>

the first entries of the vector looks like this:

199719982001
199719982002
199719982003
199719982003

For the first entry we have: year1 = 1997, year2 = 1998, id1 = 2, id2 = 001.

I want to write a regular expression that pulls out year1, id1, and the digits of id2 that are not zero. So for the first entry the regex should output: 199721.

I have tried doing this with the stringr package, and created the following regex:

"^\\d{4}|\\d{1}(?<=\\d{3}$)"

to pull out year1 and id1, however when using the lookbehind i get a "invalid regular expression" error. This is a bit puzzling to me, can R not handle lookaheads and lookbehinds?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我喜欢麦丽素 2025-01-02 10:24:35

既然这是固定格式，为什么不使用 substr 呢？ year1 使用 substr(s,1,4) 提取，id1 使用 substr(s,9,9)< 提取/code> 和 id2 为 as.numeric(substr(s,10,13))。在最后一个例子中，我使用 as.numeric 来去掉零。

回复收藏 0 原文

北笙凉宸 2025-01-02 10:24:35

您将需要使用 base 包中的 gregexpr。这有效：

> s <- "199719982001"
> gregexpr("^\\d{4}|\\d{1}(?<=\\d{3}$)",s,perl=TRUE)
[[1]]
[1]  1 12
attr(,"match.length")
[1] 4 1
attr(,"useBytes")
[1] TRUE

请注意 perl=TRUE 设置。有关更多详细信息，请参阅 ?regex。

从输出来看，您的正则表达式并没有捕获 id1 。

You will need to use gregexpr from the base package. This works:

> s <- "199719982001"
> gregexpr("^\\d{4}|\\d{1}(?<=\\d{3}$)",s,perl=TRUE)
[[1]]
[1]  1 12
attr(,"match.length")
[1] 4 1
attr(,"useBytes")
[1] TRUE

Note the perl=TRUE setting. For more details look into ?regex.

Judging from the output your regular expression does not catch id1 though.

回复收藏 0 原文

并安 2025-01-02 10:24:35

您可以使用子。

sub("^(.{4}).{4}(.{1}).*([1-9]{1,3})$","\\1\\2\\3",s)

You can use sub.

sub("^(.{4}).{4}(.{1}).*([1-9]{1,3})$","\\1\\2\\3",s)

回复收藏 0 原文

~没有更多了~

关于作者

煮茶煮酒煮时光

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

R 正则表达式回顾

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

R 正则表达式回顾

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。