Nokogiri 可以提取元数据并与主记录合并吗

发布于 2025-01-09 13:02:48 字数 1263 浏览 4 评论 0原文

200,000 行 XML 和感兴趣的数据如下所示(删除了无关信息)。还有其他不感兴趣的记录和节点

<Record type="HKQuantityTypeIdentifier" startDate="2021-10-05 09:43:40 -0800" value="130">
  <MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
  other MetadatEntries
</Record>

以下 Nokogiri 仅抓取每个 Recordrecords 的顶部行

document = File.open(path) { |f| Nokogiri::XML(f) }
records = document.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map(&:to_h)

,典型行如下所示:

{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110"}

我想添加 >HKTimeZone MetadataEntry 到哈希(然后我可以稍后一起提取信息)

{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110", timeZone="America/Los_Angeles"}

HKTimeZone 的任何值。 startDatevaluetimeZone 被逐行抓取并与其他数据一起添加到数据库中。

野科切能做到吗?或者还有其他建议吗?不能说我所理解的Nokogiri。大部分代码是由我以外的人编写的。谢谢。

或者 Nokogiri 可以,只需将整个 Record 添加到 record 中,然后稍后解析它?换句话说,Nogogiri 使用 type="HKQuantityTypeIdentifier" 收集每个记录,然后进行解析。

200,000 line XML and the data of interest looks like the following (extraneous info removed). There are other Records and nodes not of interest

<Record type="HKQuantityTypeIdentifier" startDate="2021-10-05 09:43:40 -0800" value="130">
  <MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
  other MetadatEntries
</Record>

The following Nokogiri grabs only the top lines of each Record to records

document = File.open(path) { |f| Nokogiri::XML(f) }
records = document.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map(&:to_h)

with a typical line looking like:

{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110"}

I want to add the HKTimeZone MetadataEntry to the hash (and then I can extract the information together later)

{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110", timeZone="America/Los_Angeles"}

or whatever the value was for HKTimeZone. The startDate, value and timeZone are grabbed line by line and added to the database along with other data.

Can Nokogiri do this? Or any other suggestions? Can't say as I understand Nokogiri. Most of the code is by someone other than me. Thank you.

Or can Nokogiri, just add the entire Record to record and then parse it later? In other words Nogogiri gathers each Record with type="HKQuantityTypeIdentifier" and parse afterwards.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

酷遇一生 2025-01-16 13:02:48

可能有更优雅的解决方案,但这应该有效:

doc.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map do |node|
  # Set temporary variable with record node cast to hash
  hsh = node.to_h
  # Add 'timeZone' key to temporary variable, with the @value of the first child node Metadata[@key='HKTimeZone']
  hsh["timeZone"] = node.xpath("MetadataEntry[@key='HKTimeZone']/@value").text
  # Return temporary variable for collection
  hsh
end

There are probably more elegant solutions, but this should work:

doc.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map do |node|
  # Set temporary variable with record node cast to hash
  hsh = node.to_h
  # Add 'timeZone' key to temporary variable, with the @value of the first child node Metadata[@key='HKTimeZone']
  hsh["timeZone"] = node.xpath("MetadataEntry[@key='HKTimeZone']/@value").text
  # Return temporary variable for collection
  hsh
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文