如何使用此 Coldfusion 代码读取大型 XML 文件并将数据插入数据库?

发布于 2024-12-10 06:49:23 字数 2172 浏览 1 评论 0原文

我正在使用 ColdFusion (openBlueDragon) 将数据从大型 (200MB) xml 文件插入到数据库中,而无需将整个文件加载到内存中,而这正是我传统的做法。我确实在这里找到了一个非常相似的问题:循环遍历一个大的 XML 文件似乎是我正在寻找的答案。

然而,我对java的熟练程度不够,无法理解代码并根据我的需要调整代码。我找不到办法回复发布代码的专家(@orangepips),否则我不会发布如此类似的问题。

我的 xml 文件如下所示:

 <allItems>
    <item>
        <subject>The subject text</subject>
        <date>2007-05-21 04:03:00</date>
        <content>text content often contains many paragraphs of text</content>
        <author>JPass78</author> 
    </item>
</allItems>

这是代码,由 Orangepips 提供,我正在尝试根据我的目的进行调整。我对其进行了一些修改以包含我自己的字段名称:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
"#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml")>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
<cfset event = reader.next()>
<cfif event EQ reader.START_ELEMENT>
    <cfswitch expression="#reader.getLocalName()#">
        <cfcase value="allItems">
            <!--- root node, do nothing --->
        </cfcase>
        <cfcase value="item">
            <!--- set values used later on for inserts, selects, updates --->
        </cfcase>
        <cfcase value="subject">
            <!--- some selects and insert --->
        </cfcase>
        <cfcase value="contentdate">
            <!--- insert or update --->
        </cfcase>
        <cfcase value="content">
        </cfcase>
        <cfcase value="author">
         </cfcase>  
    </cfswitch>
</cfif>
</cfloop>
<cfset reader.close()>

我有一个表,我试图弄清楚如何访问每个 XML 元素中的值,以便我可以一次插入一行?像这样:插入内容(主题,内容日期,内容,作者) VALUES("主题文字",2007-5-21 04:03:00,"此处文字内容","JPass78");

I am using ColdFusion (openBlueDragon) to insert the data from a large (200MB) xml file into a database without having to load the entire file into memory which is how I traditionally would do it. I did find a VERY SIMILAR QUESTION here: Looping over a large XML file that seems to be the answer I am looking for.

However, I'm not skilled enough in java to understand and adapt the code to my needs. I found no way to respond to the expert (@orangepips) who posted the code or else I would not have posted such a similar question.

My xml file looks like this:

 <allItems>
    <item>
        <subject>The subject text</subject>
        <date>2007-05-21 04:03:00</date>
        <content>text content often contains many paragraphs of text</content>
        <author>JPass78</author> 
    </item>
</allItems>

This is the code, courtesy orangepips, that I'm trying to adapt for my purpose. I've modified it a bit to include my own field names:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
"#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml")>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
<cfset event = reader.next()>
<cfif event EQ reader.START_ELEMENT>
    <cfswitch expression="#reader.getLocalName()#">
        <cfcase value="allItems">
            <!--- root node, do nothing --->
        </cfcase>
        <cfcase value="item">
            <!--- set values used later on for inserts, selects, updates --->
        </cfcase>
        <cfcase value="subject">
            <!--- some selects and insert --->
        </cfcase>
        <cfcase value="contentdate">
            <!--- insert or update --->
        </cfcase>
        <cfcase value="content">
        </cfcase>
        <cfcase value="author">
         </cfcase>  
    </cfswitch>
</cfif>
</cfloop>
<cfset reader.close()>

I have a single table and I am trying to figure out how do I access the values from each XML element so I may insert it one row at a time? like this: INSERT INTO content (subject,contentdate, content, author)
VALUES ("The subject text", 2007-5-21 04:03:00, "text content here","JPass78");

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

淡淡離愁欲言轉身 2024-12-17 06:49:23

不要使用 COLDFUSION 将大型 XML 文件导入 MYSQL 数据库,而是使用 MYSQL 命令“LOAD XML INFILE”。

下面是对我有用的简单、轻便且快速的代码:

LOAD XML INFILE 'pathtofile/file.xml' INTO TABLE table_name ROWS IDENTIFIED BY '';

我的 xml 文件使用完全相同的代码字段名称作为我的数据库表。 ROWS IDENTIFIED BY 告诉命令,我的 xml 文件中的字段名称将对应于我表中的数据库字段,并且它们将位于之间。标签。

仅供参考,<项目>是我自己的命名格式。您的文件可能会有另一个与您正在使用的数据相关的标签名称。例如,如果您的 xml 文件用于员工数据,您可以使用来代替。

在 MYSQL5.5 中可用 - LOAD XML INFILE 的参考可以在以下位置找到:
http://dev.mysql.com/doc/refman/5.5 /en/load-xml.html

Instead of using COLDFUSION to import large XML files into a MYSQL database, use the MYSQL command "LOAD XML INFILE".

Here's the simple, light and fast code that worked for me:

LOAD XML INFILE 'pathtofile/file.xml' INTO TABLE table_name ROWS IDENTIFIED BY '<item>';

My xml file uses the same exact field names as my database table. ROWS IDENTIFIED BY tells the command that the field names in my xml file will correspond to the database fields in my table and they will be found in between the <item></item> tags.

FYI, <item> is my own naming format. Your file will likely have another tag name that relates to the data you're working with. For example, if your xml file is for employee data, you might instead use <employee>

Available in MYSQL5.5 - Reference for LOAD XML INFILE can be found at:
http://dev.mysql.com/doc/refman/5.5/en/load-xml.html

向日葵 2024-12-17 06:49:23

一种可能性是每次遇到 元素时初始化数据结构。当子元素经过(、...)时,提取其文本并将其添加到您的结构中。然后,当您到达 元素时进行验证/插入。可能有更好的方法。但这应该会给你一些可以使用的东西。

更新:我有一种预感,数据库批量加载工具将是一个更好的选择。事实证明是;)有关详细信息,请参阅 JPass 的答案。

<cfset fis = createObject("java", "java.io.FileInputStream").init(pathToYourFile)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="item">
                <!--- start a new data row --->
                <cfset row = {}>
            </cfcase>
            <cfcase value="subject">
                <!--- extract the subject text --->
                <cfset row.subject = reader.getElementText()>
            </cfcase>
            <cfcase value="date">
                <!--- extract the date text --->
                <cfset row.date = reader.getElementText()>
            </cfcase>
            <cfcase value="content">
                <!--- extract the content text --->
                <cfset row.content = reader.getElementText()>
            </cfcase>
            <cfcase value="author">
                <!--- extract the author text --->
                <cfset row.author = reader.getElementText()>
            </cfcase>  
        </cfswitch>
    <cfelseif event EQ reader.END_ELEMENT>
        <!--- we have reached the end of the row. time to insert the data --->
        <cfif reader.getLocalName() eq "item">
            <cfdump var="#row#" label="Debug Row Data">    
            <!--- ... validate / insert "row" data into database --->
        </cfif>
    </cfif>
</cfloop>

<cfset fis.close()>
<cfset reader.close()>

One possibility is to initialize a data structure each time you encounter the <item> element. As the child elements go by (<subject>, <date>, ...), extract their text and add it to your structure. Then when you reach the </item> element do your validation/insert. There may be better approaches. But that should give you something to work with ..

Update: I had a hunch a database bulk loading tool would be a better option. Turns out it was ;) See JPass' answer for details.

<cfset fis = createObject("java", "java.io.FileInputStream").init(pathToYourFile)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="item">
                <!--- start a new data row --->
                <cfset row = {}>
            </cfcase>
            <cfcase value="subject">
                <!--- extract the subject text --->
                <cfset row.subject = reader.getElementText()>
            </cfcase>
            <cfcase value="date">
                <!--- extract the date text --->
                <cfset row.date = reader.getElementText()>
            </cfcase>
            <cfcase value="content">
                <!--- extract the content text --->
                <cfset row.content = reader.getElementText()>
            </cfcase>
            <cfcase value="author">
                <!--- extract the author text --->
                <cfset row.author = reader.getElementText()>
            </cfcase>  
        </cfswitch>
    <cfelseif event EQ reader.END_ELEMENT>
        <!--- we have reached the end of the row. time to insert the data --->
        <cfif reader.getLocalName() eq "item">
            <cfdump var="#row#" label="Debug Row Data">    
            <!--- ... validate / insert "row" data into database --->
        </cfif>
    </cfif>
</cfloop>

<cfset fis.close()>
<cfset reader.close()>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文