使用 PHP 解析具有非标准元素嵌套的大型 XML 文件(SAP 路线图文件)
问题背景:
我有一个包含大量目录、文件、附件和 JavaScript 的文件夹。有一个主要的核心文件,由 ActiveX 处理以生成由嵌套表组成的“JS Tree”类型结构。简而言之,这太糟糕了。
我遇到的问题是将其加载到数据库中,以便他们可以将状态应用于相关内容。
解析 XML 文件对我来说不一定是个问题,但让结构正确流动才是问题。该文件不是以逻辑方式嵌套的,这有助于在数据库/文件系统中轻松创建结构。 XML 文件由 Structure
节点组成,其中包含有关该节点的一些信息以及文件系统中的任何相关内容。
我正在考虑将其加载到 MPTT 类型结构中,但从逻辑上将各个节点解析为由子/父关系组成的内容是我遇到的问题。下面是该 XML 文件的示例:
<Structure nodeid="9D565FD65DE9464EA36F005866DBF3AE" ParentID = "6EEB45ED97634C9BB2730D7713255673" IsAddOnNode="True" IsCoreNode = "0" >
<Name>POS specific remarks</Name>
<Sequence>1</Sequence>
<WBS>1.1.3.1</WBS>
<BackgroundColor>#80FF00</BackgroundColor>
<FontColor>Black</FontColor>
<Comments></Comments>
<References></References>
</Structure>
<Structure nodeid="A6F7E2F0728147BB88429545A6C490CA" ParentID = "B17AB99B64664624AAA41E220A9EAE59" IsAddOnNode="False" IsCoreNode = "0" >
<Name>Execution, Monitoring, and Controlling of Results</Name>
<Sequence>4</Sequence>
<WBS>1.1.4</WBS>
<BackgroundColor></BackgroundColor>
<FontColor>White</FontColor>
<Comments>
<Comment AddOnID = "53539AB26B50472CAA2DF4E428605C87" Version="0.2"></Comment>
</Comments>
<References></References>
</Structure>
<Structure nodeid="EFCCA56742074A2A859FD1C547850ABA" ParentID = "A6F7E2F0728147BB88429545A6C490CA" IsAddOnNode="False" IsCoreNode = "0" >
<Name>Project Performance Reports</Name>
<Sequence>1</Sequence>
<WBS>1.1.4.1</WBS>
<BackgroundColor></BackgroundColor>
<FontColor>White</FontColor>
<Comments></Comments>
<References></References>
</Structure>
当使用 ActiveX 对其进行解析时,结构(位于左侧导航窗格中)的排列方式类似于标准大纲或有序列表:
1. Project Preparation
1.1 Project Management
1.1.1 Phase Star-Up
1.1.1.1 Item 1
1.1.1.2 Item 2
1.1.1.3 Item 3
等等。据我所知,这些表示节或子节 (1.1.1.2) 的值存储在 Structure
节点的 WBS
标记中。我认为我需要做的就是解析它们并据此创建结构。如何做到这一点是我困惑的地方。
此外,还有一个 Sequence
节点,用于存储有关子元素相对于其父元素的索引的信息。
我想做的
我想做的是创建一堆数据库条目(最好是在 MPTT 中),以便我可以轻松生成导航树,然后我可以开始担心“抓取”所有单独的文件,以便我也可以将其内容存储在数据库中。不知何故,我需要解析 WBS 节点值以在表中创建其“索引”。
我希望解决方案比我预期的更简单。建议,正确方向的推动将不胜感激。
我计划使用 CakePHP 中的 TreeBehavior 来管理它,但我不一定必须使用它来处理文件。
Background of issue:
I have a folder with lots of directories, files, attachments, and JavaScript. There is a main core file that is processed by ActiveX to generate a 'JS Tree' type structure made up out nested table after nested table. In short, it is awful.
The problem I am presented with, is to get it loaded into a database so they can apply states to the associated content.
Parsing the XML file is not necessarily a problem for me, however getting the structure to flow correctly is. The file is not nested in a logical manner that would lend itself to easily creating the structure in the database/filesystem. The XML file consists of Structure
nodes that contain a bit of information about that node and any relevant content in the filesystem.
I was thinking of loading it into an MPTT type structure, but logically parsing the various nodes into what consists of Child/Parent relationships is where I am stumbling. Below is a sample of this XML file:
<Structure nodeid="9D565FD65DE9464EA36F005866DBF3AE" ParentID = "6EEB45ED97634C9BB2730D7713255673" IsAddOnNode="True" IsCoreNode = "0" >
<Name>POS specific remarks</Name>
<Sequence>1</Sequence>
<WBS>1.1.3.1</WBS>
<BackgroundColor>#80FF00</BackgroundColor>
<FontColor>Black</FontColor>
<Comments></Comments>
<References></References>
</Structure>
<Structure nodeid="A6F7E2F0728147BB88429545A6C490CA" ParentID = "B17AB99B64664624AAA41E220A9EAE59" IsAddOnNode="False" IsCoreNode = "0" >
<Name>Execution, Monitoring, and Controlling of Results</Name>
<Sequence>4</Sequence>
<WBS>1.1.4</WBS>
<BackgroundColor></BackgroundColor>
<FontColor>White</FontColor>
<Comments>
<Comment AddOnID = "53539AB26B50472CAA2DF4E428605C87" Version="0.2"></Comment>
</Comments>
<References></References>
</Structure>
<Structure nodeid="EFCCA56742074A2A859FD1C547850ABA" ParentID = "A6F7E2F0728147BB88429545A6C490CA" IsAddOnNode="False" IsCoreNode = "0" >
<Name>Project Performance Reports</Name>
<Sequence>1</Sequence>
<WBS>1.1.4.1</WBS>
<BackgroundColor></BackgroundColor>
<FontColor>White</FontColor>
<Comments></Comments>
<References></References>
</Structure>
When it has been parsed with ActiveX, the structure (on the left navigation pane) is arranged like a standard outline or ordered list:
1. Project Preparation
1.1 Project Management
1.1.1 Phase Star-Up
1.1.1.1 Item 1
1.1.1.2 Item 2
1.1.1.3 Item 3
And so forth. To the best of my understanding, these values that denote the section or subsection (1.1.1.2) are stored in the WBS
tag of the Structure
node. I think what I need to do is to parse those out and create the structure according to that. How to do that is where I am stumped.
Also, there is also a Sequence
node that sems to store information about what index child element it is off of its parent element.
What I would LIKE to do
What I would like to do is create a bunch of database entries (preferably in MPTT) so that I can easily generate a nav tree and then I can begin worrying about 'scraping' all the individual files so that I can store their content in the database also. Somehow, I need to parse the WBS node value to create its 'index' within the table.
I am hoping that the solution is more simple than I am anticipating. Suggestions, a prod in the correct direction would be greatly appreciated.
I was planning on using the TreeBehavior within CakePHP to manage this, but I don't necessarily have to use that to process the file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可能是错的,但是:没有
给你结构的nodeId和它的匹配父级吗?那么您知道
EFCCA56742074A2A859FD1C547850ABA
是A6F7E2F0728147BB88429545A6C490CA
的子项吗?在 RDBMS 中存储树数据结构是一个很长的故事,因为 RDBMS 没有层次结构的概念,但有多种模型可以让您完成此类任务。您可以检查 http://www.slideshare.net/ quipo/trees-in-the-database-advanced-data-structs 开始。
邻接列表可能是最简单的方法,但是如果您使用 mySQL,因为它没有递归查询,这意味着您必须执行大量连接才能到达最后一个节点或处理树你的应用层。
I may be mistaken but doesn't the:
give you the Structure's nodeId and the matching parent of it? So you know that
EFCCA56742074A2A859FD1C547850ABA
is a child ofA6F7E2F0728147BB88429545A6C490CA
?Storing a Tree Data Structure in a RDBMS is a long story since a RDBMS does not have the concept of hierarchy, but there are various models that allow you to accomplish such a task. You could check http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures to get started.
The Adjacency List is probably the easiest way to go about it, but if you use mySQL, since it doesn't have recursive queries, would mean that you have to do lot's of joins to get down to the last node or process the tree in your application layer.