从一个文件中读取字符串并添加到另一个文件中，并添加后缀以使其唯一

发布于 2024-11-08 01:47:29 字数 679 浏览 0 评论 0原文

我正在用 ruby 处理文档。

我有一个文档，我正在使用正则表达式提取特定字符串，然后将它们添加到另一个文件中。添加到目标文件时，它们必须是唯一的，因此如果该字符串已存在于目标文件中，我将添加一个简单的后缀，例如 _1。最终我想按名称引用字符串，因此随机数生成或日期字符串是不好的。

目前，我将添加的每个单词存储在一个数组中，然后每次添加一个单词时，我都会检查该字符串是否存在于数组中，如果只有 1 个重复项就很好，但可能有 2 个或更多，所以我需要检查对于初始字符串，然后循环递增后缀直到它不存在，（我简化了我的代码，因此可能存在错误）

def add_word(word) 
  if @added_words include? word
    suffix = 1
    suffixed_word = word
    while added_words include? suffixed_word
      suffixed_word = word + "_" + suffix.to_s
      suffix += 1
    end
    word = suffixed_word                 
  end
  @added_words << word
end

它看起来很乱，是否有更好的算法或 ruby 方法来做到这一点？

原文

I am processing documents in ruby.

I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made unique so if that string already exists in the destination file I'am adding a simple suffix e.g. <word>_1. Eventually I want to be referencing the strings by name so random number generation or string from the date is no good.

At present I am storing each word added in an array and then everytime I add a word I check the string doesn't exist in an array which is fine if there is only 1 duplicate however there might be 2 or more so I need to check for the initial string then loop incrementing the suffix until it doesn't exist, (I have simplified my code so there may be bugs)

def add_word(word) 
  if @added_words include? word
    suffix = 1
    suffixed_word = word
    while added_words include? suffixed_word
      suffixed_word = word + "_" + suffix.to_s
      suffix += 1
    end
    word = suffixed_word                 
  end
  @added_words << word
end

It looks messy, is there a better algorithm or ruby way of doing this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷…雨湿花 2024-11-15 01:47:29

将@added_words设置为一个集合（不要忘记require 'set'）。这使得查找速度更快，因为集合是通过哈希实现的，同时仍然使用 include? 来检查集合成员资格。提取使用最多的后缀也很容易：

>> s << 'foo' 
#=> #<Set: {"foo"}>
>> s << 'foo_1' 
#=> #<Set: {"foo", "foo_1"}>
>> word = 'foo'
#=> "foo"
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_1"
>> s << 'foo_12' #=> 
#<Set: {"foo", "foo_1", "foo_12"}>
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_12"

现在要获取可以插入的下一个值，您只需执行以下操作（假设您已经有 12 个 foo，因此下一个应该是 foo_13)：

>> s << s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }.next 
#=> #<Set: {"foo", "foo_1", "foo_12", "foo_13"}

抱歉，如果这些例子有点混乱，我今天早些时候有麻醉。这应该足以让您了解集合如何可能对您有帮助（其中大多数也可以与数组一起使用，但集合的查找速度更快）。

Make @added_words a Set (don't forget to require 'set'). This makes for faster lookup as sets are implemented with hashes, while still using include? to check for set membership. It's also easy to extract the highest used suffix:

>> s << 'foo' 
#=> #<Set: {"foo"}>
>> s << 'foo_1' 
#=> #<Set: {"foo", "foo_1"}>
>> word = 'foo'
#=> "foo"
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_1"
>> s << 'foo_12' #=> 
#<Set: {"foo", "foo_1", "foo_12"}>
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_12"

Now to get the next value you can insert, you could just do the following (imagine you already had 12 foos, so the next should be a foo_13):

>> s << s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }.next 
#=> #<Set: {"foo", "foo_1", "foo_12", "foo_13"}

Sorry if the examples are a bit confused, I had anesthesia earlier today. It should be enough to give you an idea of how sets could potentially help you though (most of it would work with array too, but sets have faster lookup).

回复收藏 0 原文

最后的乘客 2024-11-15 01:47:29

将 @added_words 更改为默认值为零的哈希值。然后你可以这样做：

@added_words = Hash.new(0)

def add_word( word)
  @added_words[word] += 1
end

# put it to work:

list = %w(test foo bar test bar bar)
names = list.map do |w|
  "#{w}_#{add_word(w)}"
end
p @added_words
#=> {"test"=>2, "foo"=>1, "bar"=>3}
p names
#=>["test_1", "foo_1", "bar_1", "test_2", "bar_2", "bar_3"]

Change @added_words to a Hash with a default of zero. Then you can do:

@added_words = Hash.new(0)

def add_word( word)
  @added_words[word] += 1
end

# put it to work:

list = %w(test foo bar test bar bar)
names = list.map do |w|
  "#{w}_#{add_word(w)}"
end
p @added_words
#=> {"test"=>2, "foo"=>1, "bar"=>3}
p names
#=>["test_1", "foo_1", "bar_1", "test_2", "bar_2", "bar_3"]

回复收藏 0 原文

复古式 2024-11-15 01:47:29

在这种情况下，我可能会使用集合或散列：

#in your class:
require 'set'
require 'forwardable'
extend Forwardable #I'm just including this to keep your previous api

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = Set.new
end

#then instead of `def add_word(word); @added_words.add(word); end`:
def_delegator :added_words, :add_word, :add 
#or just change whatever loop to use #@added_words.add('word') rather than self#add_word('word')
#@added_words.add('word') does nothing if 'word' already exists in the set.

如果您有一些通过这些部分分组的属性，那么散列可能会更好：

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = {}
end

def add_word(word, attrs={})
   @added_words[word] ||= []
   @added_words[word].push(attrs)
end

In that case, I'd probably use a set or hash:

#in your class:
require 'set'
require 'forwardable'
extend Forwardable #I'm just including this to keep your previous api

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = Set.new
end

#then instead of `def add_word(word); @added_words.add(word); end`:
def_delegator :added_words, :add_word, :add 
#or just change whatever loop to use #@added_words.add('word') rather than self#add_word('word')
#@added_words.add('word') does nothing if 'word' already exists in the set.

If you've got some attributes that you're grouping via these sections, then a hash might be better:

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = {}
end

def add_word(word, attrs={})
   @added_words[word] ||= []
   @added_words[word].push(attrs)
end

回复收藏 0 原文

宁愿没拥抱 2024-11-15 01:47:29

以“错误的方式”进行操作，但代码稍微好一些：

def add_word(word) 
  if @added_words.include? word
    suffixed_word = 1.upto(1.0/0.0) do |suffix|
      candidate = [word, suffix].join("_")
      break candidate unless @added_words.include?(candidate)
    end
    word = suffixed_word
  end
  @added_words << word
end

Doing it the "wrong way", but in slightly nicer code:

def add_word(word) 
  if @added_words.include? word
    suffixed_word = 1.upto(1.0/0.0) do |suffix|
      candidate = [word, suffix].join("_")
      break candidate unless @added_words.include?(candidate)
    end
    word = suffixed_word
  end
  @added_words << word
end

回复收藏 0 原文

~没有更多了~