XML:回溯父元素
我正在寻找与 python 中的 XML 相关的问题的解决方案。尽管spectrum不是根元素,但我们假设它是本示例中的根元素。
<spectrum index="2" id="controller=0 scan=3" defaultArrayLength="485">
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
<cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
<cvParam cvRef="MS" accession="MS:1000127" name="centroid mass spectrum" value=""/>
<precursorList count="1">
<precursor spectrumRef="controller=0 scan=2">
<isolationWindow>
<cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
<cvParam cvRef="MS" accession="MS:1000023" name="isolation width" value="2"/>
</isolationWindow>
<selectedIonList count="1">
<selectedIon>
<cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
</selectedIon>
</selectedIonList>
<activation>
<cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
<cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="35"/>
</activation>
</precursor>
</precursorList>
<binaryDataArrayList count="2">
<binaryDataArray encodedLength="5176">
<cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
<cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<binary>AAAAYHHsbEAAAADg3yptQAAAAECt7G1AAAAAAN8JbkAAAAAA.......hLJ==</binary>
</binaryDataArray>
<binaryDataArray encodedLength="2588">
<cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
<cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value=""/>
<binary>ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66QWk6HUSpqyZCsV1NQ......uH=</binary>
</binaryDataArray>
</binaryDataArrayList>
</spectrum>
我想要实现的是找到树中的所有 selectedIon 元素并回溯其父元素谱。如果找到 selectedIon 元素,则
<块引用>所选离子信息:
<小时>质量:810.78999999999996
光谱信息: ------------- 索引=2 id=控制器=0 扫描=3 长度=485 一般信息 ------------ 毫秒级别=2 MSN频谱= - 质心质谱=- …………………… 所有 cvParam 名称和值如上。 二进制 ------ m/z 数组 = AAAAYHHsbEAAADg3yptQAAAAECt7G1AAAA.....== 强度数组 = ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66Q....5C77=
到目前为止我尝试过的:
import xml.etree.ElementTree as ET
tree=ET.parse('file.mzml')
NS="{http://psi.hupo.org/ms/mzml}"
filesource=tree.findall('.//'+NS+'selectedIon') # Will get all selectedIon element from the tree
现在我如何回溯到谱元素/子元素以解析出上述相关信息?
我怎样才能成功?
I am looking for solution to my problem related to XML in python. Though spectrum is not a root element let's suppose it's for this example.
<spectrum index="2" id="controller=0 scan=3" defaultArrayLength="485">
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
<cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
<cvParam cvRef="MS" accession="MS:1000127" name="centroid mass spectrum" value=""/>
<precursorList count="1">
<precursor spectrumRef="controller=0 scan=2">
<isolationWindow>
<cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
<cvParam cvRef="MS" accession="MS:1000023" name="isolation width" value="2"/>
</isolationWindow>
<selectedIonList count="1">
<selectedIon>
<cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
</selectedIon>
</selectedIonList>
<activation>
<cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
<cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="35"/>
</activation>
</precursor>
</precursorList>
<binaryDataArrayList count="2">
<binaryDataArray encodedLength="5176">
<cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
<cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<binary>AAAAYHHsbEAAAADg3yptQAAAAECt7G1AAAAAAN8JbkAAAAAA.......hLJ==</binary>
</binaryDataArray>
<binaryDataArray encodedLength="2588">
<cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
<cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value=""/>
<binary>ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66QWk6HUSpqyZCsV1NQ......uH=</binary>
</binaryDataArray>
</binaryDataArrayList>
</spectrum>
What I am trying to achieve is find all selectedIon element in the tree and backtrack it's parent element spectrum. If selectedIon element is found then
SelectedIon information:
Mass: 810.78999999999996
Spectra Info: ------------- index=2 id=controller=0 scan=3 length=485 General Info ------------ ms level=2 Msn spectrum= - centriod mass spectrum=- ..................... And all the cvParam name and value as above. Binary ------ m/z array = AAAAYHHsbEAAAADg3yptQAAAAECt7G1AAAA.....== intensity array = ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66Q....5C77=
What I have tried so far:
import xml.etree.ElementTree as ET
tree=ET.parse('file.mzml')
NS="{http://psi.hupo.org/ms/mzml}"
filesource=tree.findall('.//'+NS+'selectedIon') # Will get all selectedIon element from the tree
Now how can I backtrace to spectrum element/subelement to parse out relevant information as above?
How can I success?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XPath 将允许您访问祖先:“ancestor::spectrum”将返回您所包含的
元素。如果您使用 lxml,则可以使用完整的 XPath 语法来查找所需的元素。(我认为,未经测试...)
更新:实际有效的代码:
XPath will let you access an ancestor: "ancestor::spectrum" will return the
<spectrum>
element you are contained within. If you use lxml, you can use full XPath syntax to find elements you want.(I think, not tested...)
UPDATED: code that actually works:
如果这仍然是一个当前问题,您可以尝试 pymzML,mzML 文件的 python 接口。
打印所有 MS2 谱图的所有信息就像这样简单:(
披露:我是作者之一)
If this is still a current issue, you might try pymzML, a python interface to mzML files.
Printing all information from all MS2 spectra is just as easy as:
(Disclosure: I'm one of the authors)