Nokogiri 可以提取元数据并与主记录合并吗
200,000 行 XML 和感兴趣的数据如下所示(删除了无关信息)。还有其他不感兴趣的记录和节点
<Record type="HKQuantityTypeIdentifier" startDate="2021-10-05 09:43:40 -0800" value="130">
<MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
other MetadatEntries
</Record>
以下 Nokogiri 仅抓取每个 Record
到 records
的顶部行
document = File.open(path) { |f| Nokogiri::XML(f) }
records = document.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map(&:to_h)
,典型行如下所示:
{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110"}
我想添加 >HKTimeZone
MetadataEntry 到哈希(然后我可以稍后一起提取信息)
{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110", timeZone="America/Los_Angeles"}
或 HKTimeZone
的任何值。 startDate
、value
和 timeZone
被逐行抓取并与其他数据一起添加到数据库中。
野科切能做到吗?或者还有其他建议吗?不能说我所理解的Nokogiri。大部分代码是由我以外的人编写的。谢谢。
或者 Nokogiri 可以,只需将整个 Record
添加到 record
中,然后稍后解析它?换句话说,Nogogiri 使用 type="HKQuantityTypeIdentifier"
收集每个记录,然后进行解析。
200,000 line XML and the data of interest looks like the following (extraneous info removed). There are other Records and nodes not of interest
<Record type="HKQuantityTypeIdentifier" startDate="2021-10-05 09:43:40 -0800" value="130">
<MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
other MetadatEntries
</Record>
The following Nokogiri grabs only the top lines of each Record
to records
document = File.open(path) { |f| Nokogiri::XML(f) }
records = document.xpath("//Record[contains(@type,'HKQuantityTypeIdentifier')]").map(&:to_h)
with a typical line looking like:
{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110"}
I want to add the HKTimeZone
MetadataEntry to the hash (and then I can extract the information together later)
{"type"=>"HKQuantityTypeIdentifier", "startDate"=>"2014-04-02 09:48:00 -0800", "value"=>"110", timeZone="America/Los_Angeles"}
or whatever the value was for HKTimeZone
. The startDate
, value
and timeZone
are grabbed line by line and added to the database along with other data.
Can Nokogiri do this? Or any other suggestions? Can't say as I understand Nokogiri. Most of the code is by someone other than me. Thank you.
Or can Nokogiri, just add the entire Record
to record
and then parse it later? In other words Nogogiri gathers each Record with type="HKQuantityTypeIdentifier"
and parse afterwards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能有更优雅的解决方案,但这应该有效:
There are probably more elegant solutions, but this should work: