查找 xml 页面中的重复项
我正在尝试使用 ruby 和 nokogiri 在 Web 服务调用返回的 xml 中查找重复项。
我从下面的代码中得到的输出是这样的:
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]
我想知道的是 skus 1 和 2 已被复制。所以类似这样的“发现重复的 sku [重复的 sku]”。
xml是这样的:
<Root>
<Context>
<ID>1234</ID>
<Item>
<ID>4567</ID>
</Item>
<Item>
<ID>4567</ID>
</Item>
<Item>
<ID>5678</ID>
</Item>
#Context Items that will produce duplicates.
$context = ['a','b','c']
#Class that will search through an array to find duplicates
class Array
def only_duplicates
duplicates = []
self.each {|each| duplicates << each if self.count(each) > 1}
duplicates
end
end
#loops through each item in the $context array
$context.each do |item|
puts "C_ItemID = " + item
#Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
#Declare a blank array that the text from the node will be stored in
values = []
#loops through each item_id node to find duplicates.
doc.xpath('//item/id').each do |node|
values << node.text
@values = values.to_a
if @values.only_duplicates.count > 1
puts "found duplicate" + @values.only_duplicates.inspect
end
end
end
I'm trying to find duplicates within the xml returned by a web service call using ruby and nokogiri.
The output that i'm getting from the code below is something like this:
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]
What I want to know is that skus 1 and 2 have been duplicated. So something like this "found duplicate skus [Duplicated skus]."
the xml is like this:
<Root>
<Context>
<ID>1234</ID>
<Item>
<ID>4567</ID>
</Item>
<Item>
<ID>4567</ID>
</Item>
<Item>
<ID>5678</ID>
</Item>
#Context Items that will produce duplicates.
$context = ['a','b','c']
#Class that will search through an array to find duplicates
class Array
def only_duplicates
duplicates = []
self.each {|each| duplicates << each if self.count(each) > 1}
duplicates
end
end
#loops through each item in the $context array
$context.each do |item|
puts "C_ItemID = " + item
#Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
#Declare a blank array that the text from the node will be stored in
values = []
#loops through each item_id node to find duplicates.
doc.xpath('//item/id').each do |node|
values << node.text
@values = values.to_a
if @values.only_duplicates.count > 1
puts "found duplicate" + @values.only_duplicates.inspect
end
end
end
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查找重复项的更快方法(来源:Ryan LeCompte)。稍作修改&较短的版本。
A faster way to find duplicates(Credits: Ryan LeCompte). A slightly modified & shorter version.
这是在数组中查找重复项的更有效方法(使用 #count 会使算法变慢,因为它必须为每个项目遍历整个数组,O(N^2)):
Here's a more efficient way to find duplicates in an array (using #count makes the algorithm slower because it has to traverse the whole array for each item, O(N^2)):