C++:如何构建两个空格分隔字符串的交集字符串?
我有两个空格分隔的字符串...(X 并不意味着相同的符号)
st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."
我想构造一个交集字符串,如下所示: common = "kok def"
在 C++ 中执行此操作的有效方法是什么?
谢谢
I have two space separated strings... (the X doesn't mean the same symbol)
st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."
i would like to construct an intersection string as follows:common = "kok def"
what is the efficient way to do so in c++?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
std::set_intersection
示例程序:
I'我假设您已经对您的字符串进行标记 (这个解决方案似乎很容易实现) 。
就令牌(或子字符串)的数量而言,复杂度应为O(n log n)。
Use
std::set_intersection
Sample program:
I'm assuming you've tokenized your strings already (this solution seems easy to implement).
Complexity should be O(n log n) in the number of tokens (or sub-strings).
st1
拆分为子字符串,并将它们全部放入std::set
st2
拆分为子字符串,并检查它们是否存在于在步骤 1 中创建的集合。这将提供
O(n log n)
执行时间。您必须将两个字符串恰好循环一次。对于每个元素,从集合中插入和检索的时间通常为O(log n)
,即O(n log n)
。如果您可以使用具有
O(1)
插入和检索复杂度的基于哈希的集合(或其他一些无序集合),您将把复杂度降低到O(n)
。st1
in substrings and put them all into astd::set
st2
in substrings and check for each of them if they exist in the set created in step 1.This will give
O(n log n)
execution time. You have to loop through both strings exactly once. Insertion and retrieval from the set is usuallyO(log n)
for each element, which givesO(n log n)
.If you can use a hash based set (or some other unordered set) with
O(1)
insert and retrieval complexity you will cut the complexity down toO(n)
.为了扩展您已经得到的答案,基本上有两个您没有指定的因素需要考虑。首先,如果输入中存在重复元素,您是否希望将这些元素考虑用于输出。例如,给定如下输入:
由于“kok”在两个输入中出现两次,“kok”应该在输出中出现一次还是两次?
第二个是你的使用模式。您是否有一种读取所有输入,然后生成单个输出的模式,或者是一种更具迭代性的模式,您可以读取一些输入,生成一个输出,读取添加到前一个输入的更多输入,生成另一个输出,等等在?
如果您要读取所有输入,然后生成一个输出,您可能需要使用
std::vector
后跟std::sort
。如果您只希望每个输入在输出中只出现一次,无论它在两个输入中出现的频率如何,那么您将遵循std::unique
,最后执行您的设置交集
。如果您想支持迭代更新,那么您可能需要使用
std::set
或std::multiset
(std::set
用于每个输出都是唯一的,std::multiset
(如果重复输入应该给出重复的结果)。编辑:基于输入中缺乏重复,一个非常快速简单的实现将类似于:
To expand a bit on the answers you've already gotten, there are basically two factors to consider that you haven't specified. First, if there are duplicate elements in the input, do you want those considered for the output. For example, given input like:
Since "kok" appears twice in both inputs, should "kok" appear once in the output or twice?
The second is your usage pattern. Do you have a pattern of reading all the input, then generating a single output, or is a more iterative, where you might read some input, generate an output, read some more input that's added to the previous, generate another output, and so on?
If you're going read all the input, then generate one output, you probably want to use
std::vector
followed bystd::sort
. If you only want each input to appear only once in the output, regardless of how often it appears in both inputs, then you'd follow that bystd::unique
, and finally do yourset_intersection
.If you want to support iterative updates, then you probably want to use
std::set
orstd::multiset
(std::set
for each output to be unique,std::multiset
if repeated inputs should give repeated results).Edit: based on the lack of duplication in the input, a really quick simple implementation would be something like: