当前位置：文江博客话题详情

正则语法用于选择字符的第二个事件

发布于 2025-02-07 02:39:42 字数 254 浏览 1 评论 0原文

我有一个相对简单的问题，但无法在Regex中弄清楚正确的语法。我有多个实验名称作为各种格式的字符串，例如 sef001dt45 或 bv004mf 。

我要做的是在数字值（ dt 和 MF ）之后选择第二个字母的第二个事件。

我想知道 [AZ] {2} 仅中途解决我的问题。如何获得适当的子字符串？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最舍不得你 2025-02-14 02:39:42

基于stringr :: str_extract和lookaround的可能解决方案：

library(stringr)

strings <- c("SEF001DT45", "BV004MF")

str_extract(strings, "(?<=\\d)[:upper:]{2}")

#> [1] "DT" "MF"

A possible solution, based on stringr::str_extract and lookaround:

library(stringr)

strings <- c("SEF001DT45", "BV004MF")

str_extract(strings, "(?<=\\d)[:upper:]{2}")

#> [1] "DT" "MF"

回复收藏 0 原文

泅渡 2025-02-14 02:39:42

基础r：

# Using capture groups:
gsub(
  ".*\\d{2}(\\w{2}).*",
  "\\1",
  x
)

# Input data:
x <- c(
  'SEF001DT45',
  'BV004MF'
)

Base R:

# Using capture groups:
gsub(
  ".*\\d{2}(\\w{2}).*",
  "\\1",
  x
)

# Input data:
x <- c(
  'SEF001DT45',
  'BV004MF'
)

回复收藏 0 原文

臻嫒无言 2025-02-14 02:39:42

详细信息之一获得第二次出现模式

sub('.*?PATTERN.*?(PATTERN).*', '\\1', x)
stringr::str_match(x, 'PATTERN.*?(PATTERN)')[,2]
regmatches(x, regexpr('PATTERN.*?\\KPATTERN', x, perl=TRUE))

tldr ：通常，您可以使用以下

您可以使用

x <- c('SEF001DT45','BV004MF')
sub('.*?[A-Z]{2}.*?([A-Z]{2}).*', '\\1', x)
## => [1] "DT" "MF"

r demo Online 和 REGEX DEMO 。这里的目的是匹配模式的第二次出现，捕获它，然后匹配其余图案，然后用反向注册替换为捕获组值。

请注意，sub将执行单个搜索并替换操作，这很好，因为此处的正则需要整个字符串匹配。

详细信息：

。
。
）：两个大写ASCII字母
。* - 尽可能多的零或更多字符。

您可以使用Stringr :: str_match：

x <- c('SEF001DT45','BV004MF')
library(stringr)
results <- stringr::str_match(x, '[A-Z]{2}.*?([A-Z]{2})')
results[,2] ## Get Group 1 values

请参阅此r demo 。

或者，使用regMatches/regexpr in Base r：

x <- c('SEF001DT45','BV004MF')
results <- regmatches(x, regexpr('[A-Z]{2}.*?\\K[A-Z]{2}', x, perl=TRUE))
results

请参阅此r demo 。

在这里，[az] {2}。使用PCRE引擎）尽可能少，然后\ K丢弃匹配的文本和[az] {2}在模式结束时匹配第二个出现两个字母的大块。 Regexpr仅找到第一个匹配项。

TLDR: Generally, you can get the second occurrence of a PATTERN using one of the following

sub('.*?PATTERN.*?(PATTERN).*', '\\1', x)
stringr::str_match(x, 'PATTERN.*?(PATTERN)')[,2]
regmatches(x, regexpr('PATTERN.*?\\KPATTERN', x, perl=TRUE))

Details

You can use

x <- c('SEF001DT45','BV004MF')
sub('.*?[A-Z]{2}.*?([A-Z]{2}).*', '\\1', x)
## => [1] "DT" "MF"

See the R demo online and the regex demo. The point here is to match up to the second occurrence of the pattern, capture it, and then match the rest, and replace with the backreference to the capturing group value.

Note that sub will perform a single search and replace operation, and this is fine since the regex here requires the whole string match.

Details:

.*? - any zero or more chars as few as possible
[A-Z]{2} - two uppercase ASCII letters
.*? - any zero or more chars as few as possible
([A-Z]{2}) - Group 1 (\1 refers to this group value): two uppercase ASCII letters
.* - any zero or more chars as many as possible.

You can achieve this with a simpler regex using stringr::str_match:

x <- c('SEF001DT45','BV004MF')
library(stringr)
results <- stringr::str_match(x, '[A-Z]{2}.*?([A-Z]{2})')
results[,2] ## Get Group 1 values

See this R demo.

Or, with regmatches/regexpr in base R:

x <- c('SEF001DT45','BV004MF')
results <- regmatches(x, regexpr('[A-Z]{2}.*?\\K[A-Z]{2}', x, perl=TRUE))
results

See this R demo.

Here, [A-Z]{2}.*?\\K[A-Z]{2} finds the first two uppercase ASCII letters, then matches any zero or more chars (other than line break chars since the PCRE engine is used) as few as possible, and then \K discards the matched text and the [A-Z]{2} at the end of the pattern matches the second occurrence of the two-letter chunk. regexpr only finds the first match.

回复收藏 0 原文

顾忌 2025-02-14 02:39:42

也许：

s <- c("SEF001DT45", "BV004MF")
sub("[A-Z]+\\d+([A-Z]{2}).*", "\\1", s)
#sub("[A-Z]+[0-9]+([A-Z]{2}).*", "\\1", s) #Alternative
#[1] "DT" "MF"

哪里[Az]匹配字符，\\ d数字，[az] {2}两个字符和****。剩下的休息。
使用（）选择了用\\ 1插入的内容。
或对更严格的第二个字母的事件：

sub(".*?[A-Z]{2}[0-9]+([A-Z]{2}).*", "\\1", s)
#[1] "DT" "MF"

当仅提取第一个数字之后的两个字符就足够了：

regmatches(s, regexpr("(?<=\\d)[A-Z]{2}", s, perl=TRUE))
#[1] "DT" "MF"

Maybe:

s <- c("SEF001DT45", "BV004MF")
sub("[A-Z]+\\d+([A-Z]{2}).*", "\\1", s)
#sub("[A-Z]+[0-9]+([A-Z]{2}).*", "\\1", s) #Alternative
#[1] "DT" "MF"

Where [A-Z] matches characters, \\d numbers, [A-Z]{2} the two characters and .* for the remaining rest.
With () the content which is inserted with \\1 is selected.
Or something more strict about the second occurence of two letters:

sub(".*?[A-Z]{2}[0-9]+([A-Z]{2}).*", "\\1", s)
#[1] "DT" "MF"

When only the two characters after the first number should be extracted is enough:

regmatches(s, regexpr("(?<=\\d)[A-Z]{2}", s, perl=TRUE))
#[1] "DT" "MF"

回复收藏 0 原文

回梦 2025-02-14 02:39:42

另一个基本技巧是strsplit

> sapply(strsplit(s, split = "\\d+"), `[[`, 2)
[1] "DT" "MF"

或gsub

> gsub("^.*?(?<=\\d)(\\D+).*", "\\1", s, perl = TRUE)
[1] "DT" "MF"

Another base R trick is strsplit

> sapply(strsplit(s, split = "\\d+"), `[[`, 2)
[1] "DT" "MF"

or gsub

> gsub("^.*?(?<=\\d)(\\D+).*", "\\1", s, perl = TRUE)
[1] "DT" "MF"

回复收藏 0 原文

~没有更多了~

关于作者

心病无药医

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

正则语法用于选择字符的第二个事件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

冰魂雪魄

qq_Wl4Sbi

柳家齐

无法言说的痛

魄砕の薆

盗琴音

友情链接

正则语法用于选择字符的第二个事件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

冰魂雪魄

qq_Wl4Sbi

柳家齐

无法言说的痛

魄砕の薆

盗琴音

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。