将字符串拆分为列表,但保留拆分模式

发布于 2024-11-28 03:27:33 字数 152 浏览 0 评论 0原文

目前,我正在按模式分割字符串,如下所示:

outcome_array=the_text.split(pattern_to_split_by)

问题是我分割的模式本身总是被省略。

我如何让它包含分割模式本身?

Currently i am splitting a string by pattern, like this:

outcome_array=the_text.split(pattern_to_split_by)

The problem is that the pattern itself that i split by, always gets omitted.

How do i get it to include the split pattern itself?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半暖夏伤 2024-12-05 03:27:33

感谢 Mark Wilkins 的启发,但这里有一段较短的代码:

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

或者:

s.split(/(on)/).each_slice(2).map(&:join)

请参阅折叠下方的解释。


这是它的工作原理。首先,我们对“on”进行拆分,但将其括在括号中以使其成为一个匹配组。当传递给 split 的正则表达式中有一个匹配组时,Ruby 会在输出中包含该组:

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

现在我们要将每个“on”实例与前面的字符串连接起来。 each_slice(2) 通过一次将两个元素传递到其块来提供帮助。让我们调用 each_slice(2) 看看结果如何。由于 each_slice 在没有块的情况下调用时将返回一个枚举器,因此我们将 to_a 应用于枚举器,以便我们可以看到枚举器将枚举的内容

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

:越来越近了。现在我们要做的就是将单词连接在一起。这让我们得到了上面的完整解决方案。我将把它拆成单独的行,以便于理解:

b = []
s.split(/(on)/).each_slice(2) do |s|
  b << s.join
end
b
# => ["split on", "the word on" "okay?"]

但是有一个巧妙的方法可以消除临时 b 并大大缩短代码:

s.split(/(on)/).each_slice(2).map do |a|
  a.join
end

map 传递每个元素块的输入数组;块的结果成为输出数组中该位置的新元素。在 MRI >= 1.8.7 中,您可以进一步缩短它,相当于:

s.split(/(on)/).each_slice(2).map(&:join)

Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

or:

s.split(/(on)/).each_slice(2).map(&:join)

See below the fold for an explanation.


Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to split, Ruby will include that group in the output:

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

Now we want to join each instance of "on" with the preceding string. each_slice(2) helps by passing two elements at a time to its block. Let's just invoke each_slice(2) to see what results. Since each_slice, when invoked without a block, will return an enumerator, we'll apply to_a to the Enumerator so we can see what the Enumerator will enumerator over:

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:

b = []
s.split(/(on)/).each_slice(2) do |s|
  b << s.join
end
b
# => ["split on", "the word on" "okay?"]

But there's a nifty way to eliminate the temporary b and shorten the code considerably:

s.split(/(on)/).each_slice(2).map do |a|
  a.join
end

map passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:

s.split(/(on)/).each_slice(2).map(&:join)
我只土不豪 2024-12-05 03:27:33

您可以使用正则表达式断言来定位分割点,而无需消耗任何输入。下面使用正向后视断言在“on”之后进行分割:

s = "split on the word on okay?"
s.split(/(?<=on)/)
=> ["split on", " the word on", " okay?"]

或者使用正向前瞻在“on”之前进行分割:

s = "split on the word on okay?"
s.split(/(?=on)/)
=> ["split ", "on the word ", "on okay?"]

对于这样的事情,您可能需要确保“on”不是更大的一部分单词(如“断言”),并删除分割处的空格:

"don't split on assertion".split(/(?<=\bon\b)\s*/)
=> ["don't split on", "assertion"]

You could use a regular expression assertion to locate the split point without consuming any of the input. Below uses a positive look-behind assertion to split just after 'on':

s = "split on the word on okay?"
s.split(/(?<=on)/)
=> ["split on", " the word on", " okay?"]

Or a positive look-ahead to split just before 'on':

s = "split on the word on okay?"
s.split(/(?=on)/)
=> ["split ", "on the word ", "on okay?"]

With something like this, you might want to make sure 'on' was not part of a larger word (like 'assertion'), and also remove whitespace at the split:

"don't split on assertion".split(/(?<=\bon\b)\s*/)
=> ["don't split on", "assertion"]
想你只要分分秒秒 2024-12-05 03:27:33

如果您使用带有组的模式,它也会在结果中返回该模式:

irb(main):007:0> "split it here and here okay".split(/ (here) /)
=> ["split it", "here", "and", "here", "okay"]

编辑 附加信息表明目标是包括被分成两半的项目拆分的项目。我认为有一个简单的方法可以做到这一点,但我不知道,今天也没有时间去尝试。因此,在没有巧妙的解决方案的情况下,以下是一种暴力破解的方法。使用如上所述的 split 方法将拆分项包含在数组中。然后遍历数组并将每个第二个条目(根据定义是分割值)与前一个条目合并。

s = "split on the word on and include on with previous"
a = s.split(/(on)/)

# iterate through and combine adjacent items together and store
# results in a second array
b = []
a.each_index{ |i|
   b << a[i] if i.even?
   b[b.length - 1] += a[i] if i.odd?
   }

print b

结果如下:

["split on", " the word on", " and include on", " with previous"]

If you use a pattern with groups, it will return the pattern in the results as well:

irb(main):007:0> "split it here and here okay".split(/ (here) /)
=> ["split it", "here", "and", "here", "okay"]

Edit The additional information indicated that the goal is to include the item on which it was split with one of the halves of the split items. I would think there is a simple way to do that, but I don't know it and haven't had time today to play with it. So in the absence of the clever solution, the following is one way to brute force it. Use the split method as described above to include the split items in the array. Then iterate through the array and combine every second entry (which by definition is the split value) with the previous entry.

s = "split on the word on and include on with previous"
a = s.split(/(on)/)

# iterate through and combine adjacent items together and store
# results in a second array
b = []
a.each_index{ |i|
   b << a[i] if i.even?
   b[b.length - 1] += a[i] if i.odd?
   }

print b

Results in this:

["split on", " the word on", " and include on", " with previous"]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文