JAVA中不使用XMLParser部分解析XML文件

发布于 2024-11-17 15:47:08 字数 3143 浏览 3 评论 0原文

所以我发现可以使用缓冲读取器/写入器将 xml 文件逐字复制到新的 xml 文件。但是，我想知道是否可以只刮掉文档的一部分？

例如，看看这个例子：

<?xml version="1.0" encoding="UTF-8"?>
<BookCatalogue xmlns="http://www.publishing.org">
    <w:pStyle w:val="TOAHeading" />
    <Book>
    <Title>Yogasana Vijnana: the Science of Yoga</Title>
    <author>Dhirendra Brahmachari</Author>
    <Date>1966</Date>
    <ISBN>81-40-34319-4</ISBN>
    <Publisher>Dhirendra Yoga Publications</Publisher>
    <Cost currency="INR">11.50</Cost>
  </Book>
  <Book>
    <Title>The First and Last Freedom</Title>
    <v:imagedata r:id="rId7" o:title="" croptop="10523f" cropbottom="11721f" /> 
    <Author>J. Krishnamurti</Author>
    <Date>1954</Date>
    <ISBN>0-06-064831-7</ISBN>
    <Publisher>Harper &amp; Row</Publisher>
    <Cost currency="USD">2.95</Cost>
  </Book>
<w:pStyle w:val="TOAHeading2" />
</BookCatalogue>

抱歉，如果这不是正确的 XML 代码，我只是将我正在查看的文档中的花絮添加到我找到的这个示例中。但基本上，如果我想查找“标题”的实例（在本例中为第 3 行 -> TOAHeading），则从标题开始向下抓取所有内容，直到找到另一个标题实例并将其复制到另一个 xml 文件。这可能吗？此外，如果我想将其作为要存储的临时文件，并且仅在找到“图像”实例（在本例中为第 14 行）时才保留该文件，这也可能吗？我试图以最简单的方式做到这一点，所以有人对此有任何想法或经验吗？提前致谢。

public class IPDriver 
        {
            public static void main(String[] args) throws IOException
            {
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/document.xml"), "UTF-8"));
                BufferedWriter writer = new BufferedWriter(new OutputStreamReader(new FileOutputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/tempdocument.xml"), "UTF-8"));

                String line = null;

                while ((line = reader.readLine()) != null)
                {
                    writer.write(line);
                }

                // Close to unlock.
                reader.close();
                // Close to unlock and flush to disk.
                writer.close();
            }
        }

来自我的实际 XML 文档的示例

- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="address">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="Street">
- <w:r w:rsidRPr="00822244">
  <w:t>6841 Benjamin Franklin Drive</w:t> 
  </w:r>
  </w:smartTag>
  </w:smartTag>
  </w:p>
- <w:p w:rsidR="00B41602" w:rsidRPr="00822244" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
  <w:pStyle w:val="Address" /> 
  </w:pPr>
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="City">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="place">

只是 .docx 中的基本 document.xml 文件

原文

so I found out it was possible to use the buffered reader/writer to copy an xml file over word for word to a new xml file. However, I was wondering if it would be possible to scrape out only a portion of the document?

For example, looking at this example:

<?xml version="1.0" encoding="UTF-8"?>
<BookCatalogue xmlns="http://www.publishing.org">
    <w:pStyle w:val="TOAHeading" />
    <Book>
    <Title>Yogasana Vijnana: the Science of Yoga</Title>
    <author>Dhirendra Brahmachari</Author>
    <Date>1966</Date>
    <ISBN>81-40-34319-4</ISBN>
    <Publisher>Dhirendra Yoga Publications</Publisher>
    <Cost currency="INR">11.50</Cost>
  </Book>
  <Book>
    <Title>The First and Last Freedom</Title>
    <v:imagedata r:id="rId7" o:title="" croptop="10523f" cropbottom="11721f" /> 
    <Author>J. Krishnamurti</Author>
    <Date>1954</Date>
    <ISBN>0-06-064831-7</ISBN>
    <Publisher>Harper & Row</Publisher>
    <Cost currency="USD">2.95</Cost>
  </Book>
<w:pStyle w:val="TOAHeading2" />
</BookCatalogue>

Sorry if this is not proper XML Code, I just added the tidbits from the document I was looking at to this sample I found. But basically, if I wanted to look for the an instance of "heading" (in this case, 3rd line -> TOAHeading), then scrape everything from heading down until another instance of heading is found and copy it to another xml file. Is that possible? Furthermore, if I wanted to make that a temporary file I'm storing to, and only keep that file if an instance of "image" (in this case, 14th line) is found, is that possible as well? I'm trying to do this in the simplest way possible, so does anyone have any ideas or experience with this? Thanks in advance.

public class IPDriver 
        {
            public static void main(String[] args) throws IOException
            {
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/document.xml"), "UTF-8"));
                BufferedWriter writer = new BufferedWriter(new OutputStreamReader(new FileOutputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/tempdocument.xml"), "UTF-8"));

                String line = null;

                while ((line = reader.readLine()) != null)
                {
                    writer.write(line);
                }

                // Close to unlock.
                reader.close();
                // Close to unlock and flush to disk.
                writer.close();
            }
        }

Example From My Actual XML Document

- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="address">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="Street">
- <w:r w:rsidRPr="00822244">
  <w:t>6841 Benjamin Franklin Drive</w:t> 
  </w:r>
  </w:smartTag>
  </w:smartTag>
  </w:p>
- <w:p w:rsidR="00B41602" w:rsidRPr="00822244" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
  <w:pStyle w:val="Address" /> 
  </w:pPr>
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="City">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="place">

Just your basic document.xml file from a .docx

分享到QQ

分享到微博