使用 JAVA 计算、转换 XML 并将其编译为 CSV

发布于 2024-12-21 14:46:13 字数 1904 浏览 3 评论 0原文

我需要将多个 XML 文件（标准格式）转换并编译为单个 CSV 文件。因为我还需要对某些导入的元素执行计算，所以 XSLT 不是一个选项（Stackoverflow： XML 到 CSV 使用 XSLT），除非我对每个转换后的 CSV 文件执行计算。

有人建议使用 XPath 作为 SAX2 的替代方案，但由于最终的 CSV 输出很大（基于 100 多个 XML 文件），我对使用数组犹豫不决。（Stackoverflow：将 XML 文件转换为 CSV）

使用 SAX2 我在提取方面取得了一些成功标签元素。

如果我可以将每个单独文件的输出附加到最终的 CSV 输出，我认为我将拥有一个内存更稳定的应用程序。

我希望其他人能从了解这个问题的答案中受益：如何有效地处理大规模数据的计算以及 XML-CSV 转换？

XML 文件 1

<element id="1">
    <info>Yes</info>
    <startValue>0</startValue> <!-- Value entered twice, ignore--!>
    <startValue>256</startValue>
    <stopValue>64</stopValue>
</element>
<element id="2">
    <info>No</info>
    <startValue>50</startValue>
    <stopValue>25</stopValue>
</element>
<....

XML 文件 2

<element id="1">
    <info>No</info>
    <startValue>128</startValue>
    <stopValue>100</stopValue>
</element>    
<....

伪伪代码

for all files

    get ID
    get info

    for all stop and start values
        ignore wrong values: use counter
        difference[] = startValue(i) - stopValues(j) = 196, 28

    append (ID, info and difference) to file "outputfile.csv"

CSV 输出示例

File    ID  Info    Difference  Etc
_________________________________________________ 
0       1   Yes     196         ....
0       2   No      25          ....
1       1   No      28          ....
.           ...     ...         ....
.           ...     ...         ....
nfiles

原文

I need to convert and compile multiple XML files (in a standard format) to a single CSV file. Because I also need to perform computations on some of the imported elements, XSLT is not an option (Stackoverflow: XML to CSV Using XSLT) unless I perform computations on each converted CSV file.

XPath has been suggested as an alternative to SAX2, but because the final CSV output is large (based on over 100 XML files) I am hesitant to use arrays. (Stackoverflow: Convert XML file to CSV)

Using SAX2 I have been somewhat successful in extracting the tag elements.

If I could append output - for each individual file - to the final CSV output I assume that I would have a more memory stable application.

I hope others would benefit from knowing the answer to the question: How can I efficiently handle computations in conjunction with XML-CSV conversions for large-scale data?

XML file 1

<element id="1">
    <info>Yes</info>
    <startValue>0</startValue> <!-- Value entered twice, ignore--!>
    <startValue>256</startValue>
    <stopValue>64</stopValue>
</element>
<element id="2">
    <info>No</info>
    <startValue>50</startValue>
    <stopValue>25</stopValue>
</element>
<....

XML file 2

<element id="1">
    <info>No</info>
    <startValue>128</startValue>
    <stopValue>100</stopValue>
</element>    
<....

Pseudopseudocode

for all files

    get ID
    get info

    for all stop and start values
        ignore wrong values: use counter
        difference[] = startValue(i) - stopValues(j) = 196, 28

    append (ID, info and difference) to file "outputfile.csv"

CSV Eutput Example

File    ID  Info    Difference  Etc
_________________________________________________ 
0       1   Yes     196         ....
0       2   No      25          ....
1       1   No      28          ....
.           ...     ...         ....
.           ...     ...         ....
nfiles

分享到QQ

分享到微博