在单词中找到重叠的字母
我只有一个只有三个单词的字符串:
first_string <- c("self", "funny", "nymph")
您可以看到该向量的单词都可以将一个单词放在一个单词上,因为字母有一些重叠,即我们得到SEL f un <强> ny mph。让我们称其为单词火车。
此外,我还有另一个矢量,带有许多单词。让第二个向量为:
second_string <- c("house", "garden", "duck", "evil", "fluff")
我想知道第二个字符串的哪个单词可以添加到train一词中。在这种情况下,这是house
和fluff
(house
可以在Sel f 的末尾添加> ny mph和fluff
可以放在self
和funny
之间。因此,此处的预期输出将是:
expected <- data.frame(word= c("house", "fluff"), word_train= c("selfunnymphouse", "selfluffunnymph"))
重叠可以有任何长度,即自我和有趣的重叠仅与一个字符重叠,但有趣而若虫则重叠,分为两个字符。
编辑
新单词可以更改第一个单词火车的单词顺序。例如,如果第二个向量包含hugs
单词> unny,它在nymph
self
和funny
之前。
I have a string with only three words like this:
first_string <- c("self", "funny", "nymph")
As you can see the words of this vector can all be put together to one word because there is some overlap in letters, i.e. we get selfunnymph. Let`s call this a word train.
Besides, I have another vector with many words. Let the second vector be:
second_string <- c("house", "garden", "duck", "evil", "fluff")
I want to know what words of the second string can be added to the word train. In this case this is house
and fluff
(house
can be added in the end of selfunnymph and fluff
can be put between self
and funny
). So the expected output here would be:
expected <- data.frame(word= c("house", "fluff"), word_train= c("selfunnymphouse", "selfluffunnymph"))
The overlap can be of any length, i.e. self and funny overlap only with one character but funny and nymph overlap in two characters.
EDIT
The new word can change the word order of the first word train. For example, if the second vector contains the word hugs
we can make the word train nymphugselfunny, which puts nymph
before self
and funny
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想知道您为什么问这个,但是无论如何,这都是一个有趣的练习。这是我的实现:
编辑:当我查看您自己的实施时,我认为您对最长的火车感兴趣。现在,您解释了目的,我调整了算法以采用原始汽车,然后查看可以将哪些新车单独添加到原始组合中。有了以前的代码,一长串潜在的新名称将创建一些巨大的火车,这些火车对于命名家庭而言是非常不可行的。
I'm wondering why you asked this, but it was a fun exercise regardless. Here's my implementation:
edit: When I looked at your own implementation, I thought you were interested in the longest possible trains. Now you explained the purpose, I adapted the algorithm to take the original cars, and see which of the new cars could be added individually to the original set. With the previous code, a long list of potential new names would have created some huge trains that would be very unfeasible for naming a family.
事实证明,这比我想象的要难得多,但这是我最终要做的:
来为我的数据运行代码,这使我像我在编写问题时没有期望的那样长的单词火车,而最长的单词火车是
gardenymphouselfluffunny
和selfluffunnymphousevil
(都包含6个单词)。输出数据是:虽然代码很长。
It turned out to be much harded than I thought but this is what I ended up doing:
Running the code for my data from question gave me such long word trains as I did not expect while writing the question, with the longest word trains being
gardenymphouselfluffunny
andselfluffunnymphousevil
(both contain 6 words). The output data is:The code is quite long though..
由于
pracma :: perms
用于生成所有排列和检查建筑火车的历史,但这可能是一种笨重/效率低下的方法,但我希望它可以为您提供一些线索,您将获得列表
其中
house
给出4
可能的火车和fluff
给出1
train,而其他单词在second_string
无法根据first_string
构建任何火车。This might be a bulky/inefficient approach, due to
pracma::perms
for generating all permutations and checking the vadility of building trains, but I hope it could provide you with some cluesand you will obtain a list
where
house
gives4
possible trains andfluff
gives1
train, while other words insecond_string
cannot contribute to building any trains based onfirst_string
.