第一次出现时分裂
在第一次出现分隔符时分割字符串的最佳方法是什么?
例如:
"123mango abcd mango kiwi peach"
在第一个 mango
上进行拆分以获得:
" abcd mango kiwi peach"
要在最后出现处进行拆分,请参阅Python 中的分区字符串并获取冒号后最后一段的值。
What would be the best way to split a string on the first occurrence of a delimiter?
For example:
"123mango abcd mango kiwi peach"
splitting on the first mango
to get:
" abcd mango kiwi peach"
To split on the last occurrence instead, see Partition string in Python and get value of last segment after colon.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
来自文档:
From the docs:
对我来说,更好的方法是:
...因为如果发生的情况不在字符串中,您将得到“
IndexError:列表索引超出范围”
。因此
-1
不会受到任何伤害,因为出现次数已设置为 1。For me the better approach is that:
...because if happens that occurrence is not in the string you'll get "
IndexError: list index out of range"
.Therefore
-1
will not get any harm cause number of occurrences is already set to one.您还可以使用
str.partition
:使用 str.partition 的优点是它总是会返回以下形式的元组:
因此这使得解压输出非常灵活,因为总是将是 3结果中的元素元组。
You can also use
str.partition
:The advantage of using
str.partition
is that it's always gonna return a tuple in the form:So this makes unpacking the output really flexible as there's always going to be 3 elements in the resulting tuple.
总结
最简单且性能最佳的方法是使用字符串的
.partition
方法。通常,人们可能想要获取找到的分隔符之前或之后的部分,并且可能想要找到第一个或< em>最后一次出现 字符串中的分隔符。对于大多数技术来说,所有这些可能性都大致一样简单,并且从一种技术转换为另一种技术也很简单。
对于下面的示例,我们将假设:
使用
.split
.split
的第二个参数限制字符串拆分的次数。这给出了分隔符之前和之后的部分;然后我们就可以选择我们想要的。如果分隔符没有出现,则不会进行分割:
使用
.partition
结果是一个元组,并且分隔符本身在找到时被保留。
当未找到分隔符时,结果将是一个相同长度的元组,结果中包含两个空字符串:
因此,要检查分隔符是否存在,请检查第二个元素的值。
使用正则表达式
正则表达式的
.split
方法与内置字符串.split
方法具有相同的参数,用于限制拆分次数。同样,当分隔符不出现时,不会进行任何拆分:在这些示例中,
re.escape
没有任何效果,但在一般情况下,为了将分隔符指定为文字文本,有必要这样做。另一方面,使用re
模块可以发挥正则表达式的全部功能:(注意空字符串:在
e
和a 之间找到
。)peach
的使用索引和切片
使用字符串的
.index
方法找出分隔符在哪里,然后用它进行切片:这直接给出前缀。但是,如果未找到分隔符,则会引发异常:
最后一次出现后的所有内容,而不是
虽然没有询问,但我在此处提供了相关技术以供参考。
.split
和.partition
技术有直接的对应项,用于获取字符串的最后一部分(即,最后之后的所有内容) 分隔符的出现)。供参考:同样,有一个
.rindex
来匹配.index
,但它仍然会给出最后一个匹配的开头的索引分区的。因此:对于正则表达式方法,我们可以依靠反转输入的技术,查找反转定界符的第一次出现,反转各个结果,并反转结果列表:
当然,这几乎肯定需要更多努力比它的价值。
另一种方法是从分隔符到字符串末尾使用负前瞻:
由于前瞻,这是最坏情况的 O(n^2) 算法。
性能测试
虽然正则表达式方法更灵活,但速度肯定较慢。限制分割数量可以提高字符串方法和正则表达式的性能(没有限制的时间不会显示,因为它们速度较慢并且也会给出不同的结果),但是
.partition
是仍然是明显的赢家。对于此测试数据,
.index
方法速度较慢,尽管它只需创建一个子字符串并且不必迭代超出匹配项的文本(例如创建其他子字符串的目的)。预先计算分隔符的长度会有所帮助,但这仍然比.split
和.partition
方法慢。Summary
The simplest and best-performing approach is to use the
.partition
method of the string.Commonly, people may want to get the part either before or after the delimiter that was found, and may want to find either the first or last occurrence of the delimiter in the string. For most techniques, all of these possibilities are roughly as simple, and it is straightforward to convert from one to another.
For the below examples, we will assume:
Using
.split
The second parameter to
.split
limits the number of times the string will be split. This gives the parts both before and after the delimiter; then we can select what we want.If the delimiter does not appear, no splitting is done:
Using
.partition
The result is a tuple instead, and the delimiter itself is preserved when found.
When the delimiter is not found, the result will be a tuple of the same length, with two empty strings in the result:
Thus, to check whether the delimiter was present, check the value of the second element.
Using regular expressions
The
.split
method of regular expressions has the same argument as the built-in string.split
method, to limit the number of splits. Again, no splitting is done when the delimiter does not appear:In these examples,
re.escape
has no effect, but in the general case it's necessary in order to specify a delimiter as literal text. On the other hand, using there
module opens up the full power of regular expressions:(Note the empty string: that was found between the
e
and thea
ofpeach
.)Using indexing and slicing
Use the
.index
method of the string to find out where the delimiter is, then slice with that:This directly gives the prefix. However, if the delimiter is not found, an exception will be raised instead:
Everything after the last occurrence, instead
Though it wasn't asked, I include related techniques here for reference.
The
.split
and.partition
techniques have direct counterparts, to get the last part of the string (i.e., everything after the last occurrence of the delimiter). For reference:Similarly, there is a
.rindex
to match.index
, but it will still give the index of the beginning of the last match of the partition. Thus:For the regular expression approach, we can fall back on the technique of reversing the input, looking for the first appearance of the reversed delimiter, reversing the individual results, and reversing the result list:
Of course, this is almost certainly more effort than it's worth.
Another way is to use negative lookahead from the delimiter to the end of the string:
Because of the lookahead, this is a worst-case O(n^2) algorithm.
Performance testing
Though more flexible, the regular expression approach is definitely slower. Limiting the number of splits improves performance with both the string method and regular expressions (timings without the limit are not shown, because they are slower and also give a different result), but
.partition
is still a clear winner.For this test data, the
.index
approach was slower even though it only has to create one substring and doesn't have to iterate over text beyond the match (for the purpose of creating the other substrings). Pre-computing the length of the delimiter helps, but this is still slower than the.split
and.partition
approaches.