已经初始化的常量警告

发布于 2024-12-11 17:35:29 字数 1541 浏览 1 评论 0原文

我使用 Nokogiri 代码提取 HTML 节点之间的文本,并在读取文件列表时收到这些错误。我使用简单的嵌入式 HTML 时没有遇到错误。我想消除或抑制这些警告,但不知道如何做。警告出现在每个块的末尾:

extract.rb:18: warning: already initialized constant EXTRACT_RANGES
extract.rb:25: warning: already initialized constant DELIMITER_TAGS

这是我的代码:

#!/usr/bin/env ruby -wKU
require 'rubygems'
require 'nokogiri'
require 'fileutils'

source = File.open('/documents.txt')
source.readlines.each do |line|
  line.strip!
  if File.exists? line
    file = File.open(line)

doc = Nokogiri::HTML(File.read(line))

# suggested by dan healy, stackoverflow 
# Specify the range between delimiter tags that you want to extract
# triple dot is used to exclude the end point
# 1...2 means 1 and not 2
EXTRACT_RANGES = [
  1...2
 ]

# Tags which count as delimiters, not to be extracted
DELIMITER_TAGS = [
  "h1",
  "h2",
  "h3"
]

extracted_text = []

i = 0
# Change /"html"/"body" to the correct path of the tag which contains this list
(doc/"html"/"body").children.each do |el|

  if (DELIMITER_TAGS.include? el.name)
    i += 1
  else
    extract = false
    EXTRACT_RANGES.each do |cur_range|
      if (cur_range.include? i)
        extract = true
        break
      end
    end

    if extract
      s = el.inner_text.strip
      unless s.empty?
        extracted_text << el.inner_text.strip
      end
    end
  end
end

print("\n")
puts line
print(",\n")
# Print out extracted text (each element's inner text is separated by newlines)
puts extracted_text.join("\n\n")
  end
end

I'm using Nokogiri code to extract text between HTML nodes, and getting these errors when I read in a list of files. I didn't get the errors using simple embedded HTML. I'd like to eliminate or suppress the warnings but don't know how. The warnings come at the end of each block:

extract.rb:18: warning: already initialized constant EXTRACT_RANGES
extract.rb:25: warning: already initialized constant DELIMITER_TAGS

Here is my code:

#!/usr/bin/env ruby -wKU
require 'rubygems'
require 'nokogiri'
require 'fileutils'

source = File.open('/documents.txt')
source.readlines.each do |line|
  line.strip!
  if File.exists? line
    file = File.open(line)

doc = Nokogiri::HTML(File.read(line))

# suggested by dan healy, stackoverflow 
# Specify the range between delimiter tags that you want to extract
# triple dot is used to exclude the end point
# 1...2 means 1 and not 2
EXTRACT_RANGES = [
  1...2
 ]

# Tags which count as delimiters, not to be extracted
DELIMITER_TAGS = [
  "h1",
  "h2",
  "h3"
]

extracted_text = []

i = 0
# Change /"html"/"body" to the correct path of the tag which contains this list
(doc/"html"/"body").children.each do |el|

  if (DELIMITER_TAGS.include? el.name)
    i += 1
  else
    extract = false
    EXTRACT_RANGES.each do |cur_range|
      if (cur_range.include? i)
        extract = true
        break
      end
    end

    if extract
      s = el.inner_text.strip
      unless s.empty?
        extracted_text << el.inner_text.strip
      end
    end
  end
end

print("\n")
puts line
print(",\n")
# Print out extracted text (each element's inner text is separated by newlines)
puts extracted_text.join("\n\n")
  end
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

云朵有点甜 2024-12-18 17:35:29

如果代码缩进正确,就会更容易注意到常量定义是在循环内完成的。

比较

source.readlines.each do |line|
  # code
  if true

# Wrongly indented code

# More
# Wrongly
# Indented
# Code
EXTRACT_RANGES = [
  1...2
 ]

# Several more pages of code
  end
end

source.readlines.each do |line|
  # code
  if true

    # Correctly indented code

    # What is a constant doing being defined
    # this far indented?
    # Oh no - it's in a loop!

    EXTRACT_RANGES = [
      1...2
    ]

    # Several more pages of code
  end
end

If the code were properly indented, it'd be easier to notice that the constant definition was being done within a loop.

Compare

source.readlines.each do |line|
  # code
  if true

# Wrongly indented code

# More
# Wrongly
# Indented
# Code
EXTRACT_RANGES = [
  1...2
 ]

# Several more pages of code
  end
end

with

source.readlines.each do |line|
  # code
  if true

    # Correctly indented code

    # What is a constant doing being defined
    # this far indented?
    # Oh no - it's in a loop!

    EXTRACT_RANGES = [
      1...2
    ]

    # Several more pages of code
  end
end
不必你懂 2024-12-18 17:35:29

之前没注意到。只需将常量移出每个块即可

EXTRACT_RANGES = [
  1...2
]

# Tags which count as delimiters, not to be extracted
DELIMITER_TAGS = [
 "h1",
 "h2",
 "h3"
]

source.readlines.each do |line|
 line.strip!
  if File.exists? line
    file = File.open(line)

doc = Nokogiri::HTML(File.read(line))

extracted_text = []

i = 0
# Change /"html"/"body" to the correct path of the tag which contains this list
(doc/"html"/"body").children.each do |el|

  if (DELIMITER_TAGS.include? el.name)
    i += 1
  else
    extract = false
    EXTRACT_RANGES.each do |cur_range|
      if (cur_range.include? i)
        extract = true
        break
      end
    end

    if extract
     s = el.inner_text.strip
      unless s.empty?
        extracted_text << el.inner_text.strip
      end
    end
  end
end

print("\n")
puts line
print(",\n")
# Print out extracted text (each element's inner text is separated by newlines)
puts extracted_text.join("\n\n")
  end
end

Didn't notice earlier. Just move the constants out of the each block

EXTRACT_RANGES = [
  1...2
]

# Tags which count as delimiters, not to be extracted
DELIMITER_TAGS = [
 "h1",
 "h2",
 "h3"
]

source.readlines.each do |line|
 line.strip!
  if File.exists? line
    file = File.open(line)

doc = Nokogiri::HTML(File.read(line))

extracted_text = []

i = 0
# Change /"html"/"body" to the correct path of the tag which contains this list
(doc/"html"/"body").children.each do |el|

  if (DELIMITER_TAGS.include? el.name)
    i += 1
  else
    extract = false
    EXTRACT_RANGES.each do |cur_range|
      if (cur_range.include? i)
        extract = true
        break
      end
    end

    if extract
     s = el.inner_text.strip
      unless s.empty?
        extracted_text << el.inner_text.strip
      end
    end
  end
end

print("\n")
puts line
print(",\n")
# Print out extracted text (each element's inner text is separated by newlines)
puts extracted_text.join("\n\n")
  end
end
终难愈 2024-12-18 17:35:29

作为编程技巧:

在范围定义中使用 ..... 时要小心。三点版本不像两点版本那么常用,而且额外的点很容易被忽略,从而使代码更难维护。我必须有一个非常好的理由来使用三点。比较 IRB 的这些输出:

(1...2).to_a
=> [1]

与,

(1..1).to_a
=> [1]

看看第一个有多么误导。

As a programming tip:

Be careful using ... vs. .. for range definitions. The three-dot version isn't as commonly used as the two-dot version, and that extra dot can be easy to miss, making the code harder to maintain. I'd have to have a VERY good reason to use three-dots. Compare these outputs from IRB:

(1...2).to_a
=> [1]

vs.

(1..1).to_a
=> [1]

to see how misleading the first is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文