假设我想在每个 '
处将像 abc'xyz?'zzz'
这样的字符串拆分为 Vec,但如果字符前面有 <代码>?。
我想在没有正则表达式环视的情况下实现这一目标,因为我不能信任输入。
我可以假设输入是 UTF8 兼容的。
在 Rust 中实现这一点最快(并且可能是最有效的内存效率)的方法是什么?
我考虑过迭代字符串并将子字符串保存到 var 中,如果下一个 Char 是 '
,但通过 Char 比较当前 Char 不是 ?
。然后,我将该 var 的值推入 Vec 中。通过移动。
这是一个好主意,还是有更有效(时间和内存方面)的方法来实现这一目标?
Assuming I want to split a String like abc'xyz?'zzz'
at every '
into a Vec<String>, but not, if the character is preceeded by a ?
.
I want to achieve this without Regex lookarounds, since I can't trust the input.
I can assume, that the input is UTF8 compatible.
What would be the fastest (and propably most memory efficient way) to achieve this in Rust?
I thougth about iterating over the String and saving the substring into a var, if next Char is '
, but current Char is not ?
by Char comparison. I would then push that var's value into a Vec<String> by moving.
Is this a good idea, or are there more efficient (time and memory wise) ways to achieve this?
发布评论
评论(2)
实现这一目标的最惯用的方法是将其放入
Iterator
的实现中,采用&str
并生成& str
。下面是一个示例实现,假设输入字符串上的尾随
'
不应在其后生成空元素,空字符串也不应生成任何元素。请注意,没有创建任何副本,因为我们只是处理字符串切片。如果您想生成Vec
,则可以通过将迭代器映射到str::to_owned
来实现。 (.map(str::to_owned).collect::>()
)(Playground)
使用它来实现您既定目标的示例:
但是如果您实际上不需要拥有的字符串或向量,您可以直接使用迭代器,这不会需要任何额外的堆分配,因为它分配原始切片的子切片。
The most idiomatic way to achieve this would be to make it into an implementation of
Iterator
, taking&str
and producing&str
.Here is an example implementation that assumes that a trailing
'
on the input string should not produce an empty element after it, and neither should an empty string produce any element. Note that no copies are made, since we are just dealing with string slices. If you want to produce aVec<String>
then you can do so by mapping the iterator overstr::to_owned
. (.map(str::to_owned).collect::<Vec<_>>()
)(Playground)
An example of using this to achieve your stated goal:
But if you don't actually need owned strings or a vector, you can just directly use the iterator, which won't require any additional heap allocations since it dispenses subslices of the original slice.
我认为你不需要让这个变得过于复杂——一个简单的 for 循环就可以了。
这也使得您可以轻松地准确调整您想要的拆分工作方式,
例如,包含/排除分隔符,如何处理空匹配。
游乐场
如果你想生成您需要的,我们可以简单地逐一复制字符。
Vec<&str>
做更多的工作来维护对现有字符串的引用,但由于我们返回 VecI don't think you need to over-complicate this - a simple for loop will do.
This also makes it easy to adjust exactly how you want the splitting to work,
e.g. include/exclude the delimiter, what to do with empty matches.
Playground
If you wanted to produce
Vec<&str>
you would need to do more work to maintain references into the existing string but since we are returningVec<String>
we can simply copy the characters one-by-one.