解析 XML 文件 VB.NET

发布于 2024-12-19 10:43:43 字数 3638 浏览 1 评论 0原文

我有以下 XML 文件。我想从文件 out.xml 中获取 VB.NET 中标签 HSP 下第一个 Hsp_qseq、Hsp_hseq 和 Hsp_midline 的值

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastn</BlastOutput_program>
  <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>
  <BlastOutput_reference>Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), &quot;A greedy algorithm for aligning DNA sequences&quot;, J Comput Biol 2000; 7(1-2):203-14.</BlastOutput_reference>
  <BlastOutput_db>positive_Controls</BlastOutput_db>
  <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
  <BlastOutput_query-def>rs8192709_C Positive Contol Common Sequence</BlastOutput_query-def>
  <BlastOutput_query-len>249</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_expect>10</Parameters_expect>
      <Parameters_sc-match>1</Parameters_sc-match>
      <Parameters_sc-mismatch>-2</Parameters_sc-mismatch>
      <Parameters_gap-open>0</Parameters_gap-open>
      <Parameters_gap-extend>0</Parameters_gap-extend>
      <Parameters_filter>L;m;</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>rs8192709_C Positive Contol Common Sequence</Iteration_query-def>
      <Iteration_query-len>249</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gnl|BL_ORD_ID|0</Hit_id>
          <Hit_def>rs8192709_C Positive Contol Common Sequence</Hit_def>
          <Hit_accession>0</Hit_accession>
          <Hit_len>249</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>460.936057665848</Hsp_bit-score>
              <Hsp_score>249</Hsp_score>
              <Hsp_evalue>9.74431021697707e-133</Hsp_evalue>
              <Hsp_query-from>1</Hsp_query-from>
              <Hsp_query-to>249</Hsp_query-to>
              <Hsp_hit-from>1</Hsp_hit-from>
              <Hsp_hit-to>249</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>249</Hsp_identity>
              <Hsp_positive>249</Hsp_positive>
              <Hsp_gaps>0</Hsp_gaps>
              <Hsp_align-len>249</Hsp_align-len>
              <Hsp_qseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_qseq>
              <Hsp_hseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_hseq>
              <Hsp_midline>

I have this following XML File. I would like to get the values of first Hsp_qseq, Hsp_hseq and Hsp_midline under tag HSP in VB.NET from the file out.xml

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastn</BlastOutput_program>
<BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>
<BlastOutput_reference>Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14.</BlastOutput_reference>
<BlastOutput_db>positive_Controls</BlastOutput_db>
<BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
<BlastOutput_query-def>rs8192709_C Positive Contol Common Sequence</BlastOutput_query-def>
<BlastOutput_query-len>249</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_expect>10</Parameters_expect>
<Parameters_sc-match>1</Parameters_sc-match>
<Parameters_sc-mismatch>-2</Parameters_sc-mismatch>
<Parameters_gap-open>0</Parameters_gap-open>
<Parameters_gap-extend>0</Parameters_gap-extend>
<Parameters_filter>L;m;</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>rs8192709_C Positive Contol Common Sequence</Iteration_query-def>
<Iteration_query-len>249</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gnl|BL_ORD_ID|0</Hit_id>
<Hit_def>rs8192709_C Positive Contol Common Sequence</Hit_def>
<Hit_accession>0</Hit_accession>
<Hit_len>249</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>460.936057665848</Hsp_bit-score>
<Hsp_score>249</Hsp_score>
<Hsp_evalue>9.74431021697707e-133</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>249</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>249</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>249</Hsp_identity>
<Hsp_positive>249</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>249</Hsp_align-len>
<Hsp_qseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_qseq>
<Hsp_hseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_hseq>
<Hsp_midline>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(49

零度° 2024-12-26 10:43:44
我三岁 2024-12-26 10:43:44
你的笑 2024-12-26 10:43:44
葵雨 2024-12-26 10:43:44
柠栀 2024-12-26 10:43:44
鱼窥荷 2024-12-26 10:43:44
七堇年 2024-12-26 10:43:44

||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>58</Statistics_db-num>
<Statistics_db-len>24590</Statistics_db-len>
<Statistics_hsp-len>15</Statistics_hsp-len>
<Statistics_eff-space>5550480</Statistics_eff-space>
<Statistics_kappa>0.46</Statistics_kappa>
<Statistics_lambda>1.28</Statistics_lambda>
<Statistics_entropy>0.85</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

我正在尝试以下代码,但我不知道调用了多少次 .Read 函数。

Private Sub ReadFromXML()

    Dim m_xmlr As XmlTextReader
    Dim xmlnode As XmlNodeList
    Form2.Visible = True
    Try

    'Load the Xml file

    m_xmlr = New XmlTextReader("C:\Program Files\NCBI\blast-2.2.25+\bin\similarity\out.xml")
    m_xmlr.WhitespaceHandling = WhitespaceHandling.None

    m_xmlr.Read()
    m_xmlr.Read()



    While Not m_xmlr.EOF
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        'Dim qseq = m_xmlr.ReadElementString("Hsp_qseq")
        Dim hseq = m_xmlr.ReadElementString("Hsp_hseq")
        Dim midline = m_xmlr.ReadElementString("Hsp_midline")
        MsgBox(hseq) 

    End While


    Catch ex As Exception
        MsgBox(ex.Message)
    End Try
    m_xmlr.Close()
End Sub

或者有更好的方法来做到这一点吗?

谢谢

||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>58</Statistics_db-num>
<Statistics_db-len>24590</Statistics_db-len>
<Statistics_hsp-len>15</Statistics_hsp-len>
<Statistics_eff-space>5550480</Statistics_eff-space>
<Statistics_kappa>0.46</Statistics_kappa>
<Statistics_lambda>1.28</Statistics_lambda>
<Statistics_entropy>0.85</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

I am trying the following code but I don't know how many times I have call the .Read function.

Private Sub ReadFromXML()

    Dim m_xmlr As XmlTextReader
    Dim xmlnode As XmlNodeList
    Form2.Visible = True
    Try

    'Load the Xml file

    m_xmlr = New XmlTextReader("C:\Program Files\NCBI\blast-2.2.25+\bin\similarity\out.xml")
    m_xmlr.WhitespaceHandling = WhitespaceHandling.None

    m_xmlr.Read()
    m_xmlr.Read()



    While Not m_xmlr.EOF
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        m_xmlr.Read()
        'Dim qseq = m_xmlr.ReadElementString("Hsp_qseq")
        Dim hseq = m_xmlr.ReadElementString("Hsp_hseq")
        Dim midline = m_xmlr.ReadElementString("Hsp_midline")
        MsgBox(hseq) 

    End While


    Catch ex As Exception
        MsgBox(ex.Message)
    End Try
    m_xmlr.Close()
End Sub

Or is there a better way to do this?

Thanks

如此安好 2024-12-26 10:43:44

我会使用 XPath 从文件中提取您需要的信息因为它允许您准确查询所需的节点。

XPath 查询看起来可能相当复杂,但对于简单的操作来说,它相当容易上手。下面是一些示例代码,它使用 XPath 提取您提到的那些节点的值并将其值打印到控制台:

Imports System.Xml.XPath
Imports System.IO

Module Module1

    Sub Main()

        Using File As New FileStream("C:\out.xml", FileMode.Open, FileAccess.Read)

            Dim Doc As New XPathDocument(File)
            Dim Nav = Doc.CreateNavigator()

            'Select and output the value of the Hsp_qseq nodes in the file.
            Dim QSeqNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_qseq")

            While QSeqNodes.MoveNext()
                Console.WriteLine("Hsp_qseq: {0}", QSeqNodes.Current.Value)
            End While

            'Select and output the value of the Hsp_hseq nodes in the file.
            Dim HSeqNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_hseq")

            While HSeqNodes.MoveNext()
                Console.WriteLine("Hsp_hseq: {0}", HSeqNodes.Current.Value)
            End While

            'Select and output the value of the Hsp_midline nodes in the file.
            Dim MidlineNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_midline")

            While MidlineNodes.MoveNext()
                Console.WriteLine("Hsp_midline: {0}", MidlineNodes.Current.Value)
            End While

            Console.Read()

        End Using

    End Sub

End Module

上面代码中唯一有趣的部分是 Dim Foo = Nav.Select("...") 位,参数是查询所需信息的查询表达式 - 在本例中,它是从根到您所在节点的简单路径,但可以使用更强大的查询来执行。

这会为每个匹配的节点返回一个迭代器,因此这只是迭代并处理返回的每个节点的情况。

I'd use XPath to pull out the information you need from the file since it allows you to query for exactly the nodes you need.

XPath queries can be quite hairy looking, but for simple operations it's fairly easy to get started with. Here's some sample code that pulls out the values of those nodes you mentioned using XPath and prints their values to the console:

Imports System.Xml.XPath
Imports System.IO

Module Module1

    Sub Main()

        Using File As New FileStream("C:\out.xml", FileMode.Open, FileAccess.Read)

            Dim Doc As New XPathDocument(File)
            Dim Nav = Doc.CreateNavigator()

            'Select and output the value of the Hsp_qseq nodes in the file.
            Dim QSeqNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_qseq")

            While QSeqNodes.MoveNext()
                Console.WriteLine("Hsp_qseq: {0}", QSeqNodes.Current.Value)
            End While

            'Select and output the value of the Hsp_hseq nodes in the file.
            Dim HSeqNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_hseq")

            While HSeqNodes.MoveNext()
                Console.WriteLine("Hsp_hseq: {0}", HSeqNodes.Current.Value)
            End While

            'Select and output the value of the Hsp_midline nodes in the file.
            Dim MidlineNodes = Nav.Select("//BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit/Hit_hsps/Hsp/Hsp_midline")

            While MidlineNodes.MoveNext()
                Console.WriteLine("Hsp_midline: {0}", MidlineNodes.Current.Value)
            End While

            Console.Read()

        End Using

    End Sub

End Module

The only interesting part of the code above is the Dim Foo = Nav.Select("...") bits, the argument is the query expression to query for the info you want - in this case it's a simple path from the root down to the node you're after, but it is possilbe to use much more powerful queries to execute.

This returns an iterator for each matched node, so then it's just a case of iterating through and processing each node that's returned.

虐人心 2024-12-26 10:43:43
金橙橙 2024-12-26 10:43:43
路弥 2024-12-26 10:43:43
被翻牌 2024-12-26 10:43:43
白芷 2024-12-26 10:43:43
甜心 2024-12-26 10:43:43
好倦 2024-12-26 10:43:43
九歌凝 2024-12-26 10:43:43
哆兒滾 2024-12-26 10:43:43
梦里°也失望 2024-12-26 10:43:43

|||||||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gnl|BL_ORD_ID|29</Hit_id>
<Hit_def>rs8192709_R Positive Control Rare Sequence </Hit_def>
<Hit_accession>29</Hit_accession>
<Hit_len>249</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>455.396108708835</Hsp_bit-score>
<Hsp_score>246</Hsp_score>
<Hsp_evalue>4.53358655933358e-131</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>249</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>249</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>248</Hsp_identity>
<Hsp_positive>248</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>249</Hsp_align-len>
<Hsp_qseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_qseq>
<Hsp_hseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGTGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_hseq>
<Hsp_midline>

|||||||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gnl|BL_ORD_ID|29</Hit_id>
<Hit_def>rs8192709_R Positive Control Rare Sequence </Hit_def>
<Hit_accession>29</Hit_accession>
<Hit_len>249</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>455.396108708835</Hsp_bit-score>
<Hsp_score>246</Hsp_score>
<Hsp_evalue>4.53358655933358e-131</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>249</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>249</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>248</Hsp_identity>
<Hsp_positive>248</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>249</Hsp_align-len>
<Hsp_qseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_qseq>
<Hsp_hseq>GGTCAGGATAAAAGGCCCAGTTGGAGGCTGCAGCAGGGTGCAGGGCAGTCAGACCAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGTGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGGTAAGACACAGACGAAT</Hsp_hseq>
<Hsp_midline>

枕花眠 2024-12-26 10:43:43
贱贱哒 2024-12-26 10:43:43
断肠人 2024-12-26 10:43:43
土豪 2024-12-26 10:43:43
昇り龍 2024-12-26 10:43:43
疑心病 2024-12-26 10:43:43

||||

||||

叹梦 2024-12-26 10:43:43
饮惑 2024-12-26 10:43:43
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文