从一个文件中读取字符串并添加到另一个文件中,并添加后缀以使其唯一
我正在用 ruby 处理文档。
我有一个文档,我正在使用正则表达式提取特定字符串,然后将它们添加到另一个文件中。添加到目标文件时,它们必须是唯一的,因此如果该字符串已存在于目标文件中,我将添加一个简单的后缀,例如
。最终我想按名称引用字符串,因此随机数生成或日期字符串是不好的。
目前,我将添加的每个单词存储在一个数组中,然后每次添加一个单词时,我都会检查该字符串是否存在于数组中,如果只有 1 个重复项就很好,但可能有 2 个或更多,所以我需要检查对于初始字符串,然后循环递增后缀直到它不存在,(我简化了我的代码,因此可能存在错误)
def add_word(word)
if @added_words include? word
suffix = 1
suffixed_word = word
while added_words include? suffixed_word
suffixed_word = word + "_" + suffix.to_s
suffix += 1
end
word = suffixed_word
end
@added_words << word
end
它看起来很乱,是否有更好的算法或 ruby 方法来做到这一点?
I am processing documents in ruby.
I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made unique so if that string already exists in the destination file I'am adding a simple suffix e.g. <word>_1
. Eventually I want to be referencing the strings by name so random number generation or string from the date is no good.
At present I am storing each word added in an array and then everytime I add a word I check the string doesn't exist in an array which is fine if there is only 1 duplicate however there might be 2 or more so I need to check for the initial string then loop incrementing the suffix until it doesn't exist, (I have simplified my code so there may be bugs)
def add_word(word)
if @added_words include? word
suffix = 1
suffixed_word = word
while added_words include? suffixed_word
suffixed_word = word + "_" + suffix.to_s
suffix += 1
end
word = suffixed_word
end
@added_words << word
end
It looks messy, is there a better algorithm or ruby way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
将
@added_words
设置为一个集合(不要忘记require 'set'
)。这使得查找速度更快,因为集合是通过哈希实现的,同时仍然使用include?
来检查集合成员资格。提取使用最多的后缀也很容易:现在要获取可以插入的下一个值,您只需执行以下操作(假设您已经有 12 个
foo
,因此下一个应该是foo_13
):抱歉,如果这些例子有点混乱,我今天早些时候有麻醉。这应该足以让您了解集合如何可能对您有帮助(其中大多数也可以与数组一起使用,但集合的查找速度更快)。
Make
@added_words
a Set (don't forget torequire 'set'
). This makes for faster lookup as sets are implemented with hashes, while still usinginclude?
to check for set membership. It's also easy to extract the highest used suffix:Now to get the next value you can insert, you could just do the following (imagine you already had 12
foo
s, so the next should be afoo_13
):Sorry if the examples are a bit confused, I had anesthesia earlier today. It should be enough to give you an idea of how sets could potentially help you though (most of it would work with array too, but sets have faster lookup).
将 @added_words 更改为默认值为零的哈希值。然后你可以这样做:
Change @added_words to a Hash with a default of zero. Then you can do:
在这种情况下,我可能会使用集合或散列:
如果您有一些通过这些部分分组的属性,那么散列可能会更好:
In that case, I'd probably use a set or hash:
If you've got some attributes that you're grouping via these sections, then a hash might be better:
以“错误的方式”进行操作,但代码稍微好一些:
Doing it the "wrong way", but in slightly nicer code: