使用 Nokogiri 收集 XML 中的属性

发布于 2025-01-04 07:17:37 字数 6076 浏览 0 评论 0原文

我有这个 XML:

<RECEIPT receiptDate="2012-02-10T12:46:26.661Z" submissionFile="E.coli_ENT_WS.submission.xml" success="false">

  <EXPERIMENT alias="ENT 23" status="PUBLIC"/>
  <EXPERIMENT alias="WS 23" status="PUBLIC"/>
  <RUN alias="ENT 23" status="PUBLIC"/>
  <RUN alias="WS 23" status="PUBLIC"/><
  SAMPLE alias="ENT 23" status="PUBLIC"/>
  <SAMPLE alias="WS 23" status="PUBLIC"/>
  <STUDY alias="ENT 23" status="PUBLIC"/>
  <STUDY alias="WS 23" status="PUBLIC"/>
  <SUBMISSION alias="E.coli_ENT_WS"/>
  <MESSAGES>
    <ERROR> In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
    <ERROR> In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)</ERROR>
    <ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
    <ERROR> FILE attribute quality_scoring_system is required</ERROR>
    <ERROR>Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run</ERROR>
    <ERROR> In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
    <ERROR> In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)</ERROR>
    <ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
    <ERROR> FILE attribute quality_scoring_system is required</ERROR>
    <INFO> VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml      </INFO>
    <INFO>Inform_on_error is not filled in; auto populated from Submission account. </INFO>
    <INFO>Number of files in drop box = 2 &amp; Number of files in Submission = 1</INFO>
    <INFO>Deprecated element ignored: CENTER_NAME</INFO>
    <INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
    <INFO>Deprecated element ignored: CENTER_NAME</INFO>
    <INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
    <INFO> SPOT_DESCRIPTOR is missing</INFO><INFO> SPOT_DESCRIPTOR is missing</INFO>
    <INFO>Experiment (ENT 23) SPOTDESCRIPTOR is optional is null</INFO>
    <INFO>Experiment (WS 23) SPOTDESCRIPTOR is optional is null</INFO>
    <INFO> In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
    <INFO> In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
  </MESSAGES>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>HOLD</ACTIONS>
</RECEIPT>

我能够检索所有元素标签,主要是 EXPERIMENTERRORINFOACTION消息

我想检索的是 EXPERIMENTRECEIPT 等元素的属性,

我正在使用 Nokogiri 进行解析。

我的代码是这样的:

@req_test = %x[curl -F "SUBMISSION=@xml/#{@experiment.alias}.submission.xml" -F "STUDY=@xml/#{@experiment.alias}.study.xml" -F "SAMPLE=@xml/#{@experiment.alias}.sample.xml" -F "RUN=@xml/#{@experiment.alias}.run.xml" -F "EXPERIMENT=@xml/#{@experiment.alias}.experiment.xml" https://www-test.ebi.ac.uk/ena/submit/drop-box/submit/]
    @doc = Nokogiri::XML(@req_test) 

    # collecting all the errors
    @expt = @doc.xpath("//ERROR")

    # Collecting all the INFO
    @info = @doc.xpath("//INFO")

那是我的控制器。我的视图仅用于显示:

<h3>This is the ERRORS Collected</h3>
<% for expt in @expt %>
<ul>
  <li><%= expt %><br \></li>
</ul>
<% end %>

<br \ >

<h3>This is the INFO Collected</h3>

<% for info in @info %>
<ul>
  <li><%= info %><br \></li>
</ul>
<% end %>

应用程序呈现如下内容:

This is the ERRORS Collected

    In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found

    In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)

    The Illumina_native_fastq file format required gzip compression for submission.

    FILE attribute quality_scoring_system is required

    Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run

    In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found

    In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)

    The Illumina_native_fastq file format required gzip compression for submission.

    FILE attribute quality_scoring_system is required


This is the INFO Collected

    VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml

    Inform_on_error is not filled in; auto populated from Submission account.

    Number of files in drop box = 2 & Number of files in Submission = 1

    Deprecated element ignored: CENTER_NAME

    Deprecated element PROJECT_ID converted to RELATED_STUDY

    Deprecated element ignored: CENTER_NAME

    Deprecated element PROJECT_ID converted to RELATED_STUDY

    SPOT_DESCRIPTOR is missing

    SPOT_DESCRIPTOR is missing

    Experiment (ENT 23) SPOTDESCRIPTOR is optional is null

    Experiment (WS 23) SPOTDESCRIPTOR is optional is null

    In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files

    In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files

请有人建议检索方法/选项。

I have this XML:

<RECEIPT receiptDate="2012-02-10T12:46:26.661Z" submissionFile="E.coli_ENT_WS.submission.xml" success="false">

  <EXPERIMENT alias="ENT 23" status="PUBLIC"/>
  <EXPERIMENT alias="WS 23" status="PUBLIC"/>
  <RUN alias="ENT 23" status="PUBLIC"/>
  <RUN alias="WS 23" status="PUBLIC"/><
  SAMPLE alias="ENT 23" status="PUBLIC"/>
  <SAMPLE alias="WS 23" status="PUBLIC"/>
  <STUDY alias="ENT 23" status="PUBLIC"/>
  <STUDY alias="WS 23" status="PUBLIC"/>
  <SUBMISSION alias="E.coli_ENT_WS"/>
  <MESSAGES>
    <ERROR> In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
    <ERROR> In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)</ERROR>
    <ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
    <ERROR> FILE attribute quality_scoring_system is required</ERROR>
    <ERROR>Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run</ERROR>
    <ERROR> In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
    <ERROR> In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)</ERROR>
    <ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
    <ERROR> FILE attribute quality_scoring_system is required</ERROR>
    <INFO> VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml      </INFO>
    <INFO>Inform_on_error is not filled in; auto populated from Submission account. </INFO>
    <INFO>Number of files in drop box = 2 & Number of files in Submission = 1</INFO>
    <INFO>Deprecated element ignored: CENTER_NAME</INFO>
    <INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
    <INFO>Deprecated element ignored: CENTER_NAME</INFO>
    <INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
    <INFO> SPOT_DESCRIPTOR is missing</INFO><INFO> SPOT_DESCRIPTOR is missing</INFO>
    <INFO>Experiment (ENT 23) SPOTDESCRIPTOR is optional is null</INFO>
    <INFO>Experiment (WS 23) SPOTDESCRIPTOR is optional is null</INFO>
    <INFO> In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
    <INFO> In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
  </MESSAGES>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>VALIDATE</ACTIONS>
  <ACTIONS>HOLD</ACTIONS>
</RECEIPT>

I am able to retrieve all the element tags mainly EXPERIMENT, ERROR, INFO, ACTION, MESSAGE.

What I would like to retrieve is the attributes from elements like EXPERIMENT and RECEIPT

I am using Nokogiri for my parsing.

My code is like this:

@req_test = %x[curl -F "SUBMISSION=@xml/#{@experiment.alias}.submission.xml" -F "STUDY=@xml/#{@experiment.alias}.study.xml" -F "SAMPLE=@xml/#{@experiment.alias}.sample.xml" -F "RUN=@xml/#{@experiment.alias}.run.xml" -F "EXPERIMENT=@xml/#{@experiment.alias}.experiment.xml" https://www-test.ebi.ac.uk/ena/submit/drop-box/submit/]
    @doc = Nokogiri::XML(@req_test) 

    # collecting all the errors
    @expt = @doc.xpath("//ERROR")

    # Collecting all the INFO
    @info = @doc.xpath("//INFO")

That was my controller. My View is something just for display:

<h3>This is the ERRORS Collected</h3>
<% for expt in @expt %>
<ul>
  <li><%= expt %><br \></li>
</ul>
<% end %>

<br \ >

<h3>This is the INFO Collected</h3>

<% for info in @info %>
<ul>
  <li><%= info %><br \></li>
</ul>
<% end %>

and the application renders something like this:

This is the ERRORS Collected

    In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found

    In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)

    The Illumina_native_fastq file format required gzip compression for submission.

    FILE attribute quality_scoring_system is required

    Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run

    In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found

    In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)

    The Illumina_native_fastq file format required gzip compression for submission.

    FILE attribute quality_scoring_system is required


This is the INFO Collected

    VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml

    Inform_on_error is not filled in; auto populated from Submission account.

    Number of files in drop box = 2 & Number of files in Submission = 1

    Deprecated element ignored: CENTER_NAME

    Deprecated element PROJECT_ID converted to RELATED_STUDY

    Deprecated element ignored: CENTER_NAME

    Deprecated element PROJECT_ID converted to RELATED_STUDY

    SPOT_DESCRIPTOR is missing

    SPOT_DESCRIPTOR is missing

    Experiment (ENT 23) SPOTDESCRIPTOR is optional is null

    Experiment (WS 23) SPOTDESCRIPTOR is optional is null

    In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files

    In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files

Please could someone suggest the retrieving method/option.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱本泡沫多脆弱 2025-01-11 07:17:37

我不清楚您想做什么,或者您的问题是什么。以下是可能有帮助的各种答案。

对于任何元素,您可以使用 Nokogiri::XML::Node#attributes 获取将节点名称映射到 Nokogiri::XML::Attr (其中有一个 .value(您可以阅读):

require 'nokogiri'
require 'erb'

template = <<ENDERB
<% unless @expts.empty? %>
<h3>Experiments</h3>
<ul><% for expt in @expts %>
  <li><%= expt %><ul>
    <% expt.attributes.each do |name,attr| %>
      <li><%=name%> = <%=attr.value%></li>
    <% end %>
  </ul></li>
<% end %></ul>
<% end %>
ENDERB

doc = Nokogiri.XML(DATA)
@expts = doc.xpath("//EXPERIMENT")   
puts ERB.new(template).result(binding).gsub(/^[ \t]*\n/,'')
#=> <h3>Experiments</h3>
#=> <ul>
#=>   <li><EXPERIMENT alias="ENT 23" status="PUBLIC"/><ul>
#=>       <li>alias = ENT 23</li>
#=>       <li>status = PUBLIC</li>
#=>   </ul></li>
#=>   <li><EXPERIMENT alias="WS 23" status="PUBLIC"/><ul>
#=>       <li>alias = WS 23</li>
#=>       <li>status = PUBLIC</li>
#=>   </ul></li>
#=> </ul>

除了 attributes(哈希),您还可以使用 .attribute_nodes,这给你一个直接Attr 数组(每个都有 .name.value)。

或者,在迭代实验元素时,您可以使用……

<%= expt['alias'] %>

提取已知属性的值(返回一个字符串,例如“ENT 23”)。

如果您尝试自己提取所有属性,您还可以使用...

@aliases = @doc.xpath('//@alias')

...如果您想获取文档中任何位置的这些属性的数组(其中包含 .name 和 <代码>.值)。

如果您只想要特定元素上的所有 alias 属性(例如 EXPERIMENT),那么您可以使用...

@expt_aliases = @doc.xpath('//EXPERIMENT/@alias')

It's not clear to me what you are trying to do, or what your problem is. Below are a variety of answers that might help.

For any element you can use Nokogiri::XML::Node#attributes to get a hash mapping the name of the node to a Nokogiri::XML::Attr (which has a .value you can read):

require 'nokogiri'
require 'erb'

template = <<ENDERB
<% unless @expts.empty? %>
<h3>Experiments</h3>
<ul><% for expt in @expts %>
  <li><%= expt %><ul>
    <% expt.attributes.each do |name,attr| %>
      <li><%=name%> = <%=attr.value%></li>
    <% end %>
  </ul></li>
<% end %></ul>
<% end %>
ENDERB

doc = Nokogiri.XML(DATA)
@expts = doc.xpath("//EXPERIMENT")   
puts ERB.new(template).result(binding).gsub(/^[ \t]*\n/,'')
#=> <h3>Experiments</h3>
#=> <ul>
#=>   <li><EXPERIMENT alias="ENT 23" status="PUBLIC"/><ul>
#=>       <li>alias = ENT 23</li>
#=>       <li>status = PUBLIC</li>
#=>   </ul></li>
#=>   <li><EXPERIMENT alias="WS 23" status="PUBLIC"/><ul>
#=>       <li>alias = WS 23</li>
#=>       <li>status = PUBLIC</li>
#=>   </ul></li>
#=> </ul>

Instead of attributes (a Hash) you can also use .attribute_nodes, which gives you a straight array of Attrs (with a .name and .value each).

Alternatively, while iterating through your experiment elements you could use…

<%= expt['alias'] %>

…to extract the value of a known attribute (returning a string such as "ENT 23").

If you're trying to extract all the attributes on their own, you could also use…

@aliases = @doc.xpath('//@alias')

…if you wanted to get an array of just those attributes anywhere in the document (which have a .name and .value).

If you only want all the alias attributes on a particular element (e.g. EXPERIMENT) then you can use…

@expt_aliases = @doc.xpath('//EXPERIMENT/@alias')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文