将字符串拆分为列表,但保留拆分模式
目前,我正在按模式分割字符串,如下所示:
outcome_array=the_text.split(pattern_to_split_by)
问题是我分割的模式本身总是被省略。
我如何让它包含分割模式本身?
Currently i am splitting a string by pattern, like this:
outcome_array=the_text.split(pattern_to_split_by)
The problem is that the pattern itself that i split by, always gets omitted.
How do i get it to include the split pattern itself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
感谢 Mark Wilkins 的启发,但这里有一段较短的代码:
或者:
请参阅折叠下方的解释。
这是它的工作原理。首先,我们对“on”进行拆分,但将其括在括号中以使其成为一个匹配组。当传递给
split
的正则表达式中有一个匹配组时,Ruby 会在输出中包含该组:现在我们要将每个“on”实例与前面的字符串连接起来。
each_slice(2)
通过一次将两个元素传递到其块来提供帮助。让我们调用each_slice(2)
看看结果如何。由于each_slice
在没有块的情况下调用时将返回一个枚举器,因此我们将to_a
应用于枚举器,以便我们可以看到枚举器将枚举的内容:越来越近了。现在我们要做的就是将单词连接在一起。这让我们得到了上面的完整解决方案。我将把它拆成单独的行,以便于理解:
但是有一个巧妙的方法可以消除临时
b
并大大缩短代码:map
传递每个元素块的输入数组;块的结果成为输出数组中该位置的新元素。在 MRI >= 1.8.7 中,您可以进一步缩短它,相当于:Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:
or:
See below the fold for an explanation.
Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to
split
, Ruby will include that group in the output:Now we want to join each instance of "on" with the preceding string.
each_slice(2)
helps by passing two elements at a time to its block. Let's just invokeeach_slice(2)
to see what results. Sinceeach_slice
, when invoked without a block, will return an enumerator, we'll applyto_a
to the Enumerator so we can see what the Enumerator will enumerator over:We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:
But there's a nifty way to eliminate the temporary
b
and shorten the code considerably:map
passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:您可以使用正则表达式断言来定位分割点,而无需消耗任何输入。下面使用正向后视断言在“on”之后进行分割:
或者使用正向前瞻在“on”之前进行分割:
对于这样的事情,您可能需要确保“on”不是更大的一部分单词(如“断言”),并删除分割处的空格:
You could use a regular expression assertion to locate the split point without consuming any of the input. Below uses a positive look-behind assertion to split just after 'on':
Or a positive look-ahead to split just before 'on':
With something like this, you might want to make sure 'on' was not part of a larger word (like 'assertion'), and also remove whitespace at the split:
如果您使用带有组的模式,它也会在结果中返回该模式:
编辑 附加信息表明目标是包括被分成两半的项目拆分的项目。我认为有一个简单的方法可以做到这一点,但我不知道,今天也没有时间去尝试。因此,在没有巧妙的解决方案的情况下,以下是一种暴力破解的方法。使用如上所述的
split
方法将拆分项包含在数组中。然后遍历数组并将每个第二个条目(根据定义是分割值)与前一个条目合并。结果如下:
If you use a pattern with groups, it will return the pattern in the results as well:
Edit The additional information indicated that the goal is to include the item on which it was split with one of the halves of the split items. I would think there is a simple way to do that, but I don't know it and haven't had time today to play with it. So in the absence of the clever solution, the following is one way to brute force it. Use the
split
method as described above to include the split items in the array. Then iterate through the array and combine every second entry (which by definition is the split value) with the previous entry.Results in this: