从大量xml文件中提取一些数据

发布于 2024-09-01 20:49:18 字数 504 浏览 14 评论 0 原文

我将板球运动员配置文件以 .xml 文件的形式保存在文件夹中。每个文件中都包含以下标签:

 <playerid>547</playerid>
 <majorteam>England</majorteam>
 <playername>Don</playername>

playerid 与 .xml 中的相同(每个文件的大小不同,1kb 到 5kb)。这些文件大约有 500 个。我需要的是将所有这些文件中的玩家名、主要球队和玩家 ID 提取到一个列表中。稍后我会将该列表转换为 XML。如果您知道我如何直接对 XML 进行操作,我将非常感激。

如果有办法使用 C# 或 Windows 批处理文件或 vbscript 来完成此操作,我也可以使用 Java。我只需要在一处获取我的数据(ID 和姓名)。

I have cricket player profiles saved in the form of <playerid>.xml files in a folder. Each file has these tags in it:

 <playerid>547</playerid>
 <majorteam>England</majorteam>
 <playername>Don</playername>

The playerid is same as in <playerid>.xml (each file is of different size,1kb to 5kb). These are about 500 files. What I need is to extract the playername, majorteam, and playerid from all these files to a list. I will convert that list to XML later. If you know how can I do it directly to XML I will be very thankful.

If there is way to do it with C# or windows batch files or vbscript, I can use Java also. I just need get my data (id and name) at one place.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

摘星┃星的人 2024-09-08 20:49:18

你为什么不直接做 cat *.xml > all.xml

Why don't you just do cat *.xml > all.xml?

香橙ぽ 2024-09-08 20:49:18

使用 xsd.exe 从 XML 文件生成架构和类。

打开 Visual Studio 2008 命令提示符。
从 Visual Studio 2008 命令提示符中,运行

c:\temp>; xsd.exeplayer.xml

这会根据您的 XML 文件生成 XML 架构。

接下来,从 Visual Studio 2008 命令提示符中运行

c:\temp>; xsd.exe player.xsd /classes /语言:CS

这将根据您的架构创建一个新类。

现在使用您生成的类编写代码来反序列化 XML 文件;您可以将此代码放在一个循环中以处理多个文件。

FileStream fs = new FileStream("Player.XML", FileMode.Open);
// Create an XmlSerializer object to perform the deserialization
XmlSerializer xs = new XmlSerializer(typeof(Player));

Player p = xs.Deserialize(fs) as Player;
if ( s != null )
{
    // process player here          
}

Use xsd.exe to generate a schema and class from your XML file.

Open a Visual Studio 2008 Command Prompt.
From the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xml

This generates an XML Schema based on your XML file.

Next, from the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xsd /classes /language:CS

This creates a new class based on your schema.

Now write code to deserialise the XML file using the class you generated; you can place this code in a loop for more than file.

FileStream fs = new FileStream("Player.XML", FileMode.Open);
// Create an XmlSerializer object to perform the deserialization
XmlSerializer xs = new XmlSerializer(typeof(Player));

Player p = xs.Deserialize(fs) as Player;
if ( s != null )
{
    // process player here          
}
打小就很酷 2024-09-08 20:49:18

如果我必须完成这项任务,我可能会用 Perl 来完成。前面关于连接 (cat) 所有文件的建议并不正确,因为您最终得到的将不是一个有效的 XML 文件,而是一堆背靠背的有效 XML 文件。

Perl 有一个名为 CPAN 的库,其中包含完成任务的各种内容。如果您安装了XPath Library,它应该很漂亮轻松搜索所需的节点并将其输出到列表中。

如果 XPath 太麻烦,您可能还需要研究正则表达式,俗称正则表达式。 Perl 拥有令人惊叹的正则表达式支持。

如果我必须使用 Java,我可能会使用 Sun 的 XML 流 API (StAX)

If I had to do this task, I'd probably do it in Perl. The previous suggestion to concatenate (cat) all the files isn't really correct, since what you'll end up with will not be a valid XML file, but rather a bunch of valid XML files back to back.

Perl has a library called CPAN which contains all sorts of things for getting tasks done. If you install the XPath Library, it should be pretty easy to search for nodes you want and output them in a list.

If XPath is too burdensome, you might also want to look into regular expressions, colloquially known as regexes. Perl has amazing regex support.

If I had to use Java, I'd probably use its support for regular expressions. If I wanted to really get nitty-gritty with the XML nodes of the documents, I'd likely use Sun's Streaming API for XML (StAX).

甜警司 2024-09-08 20:49:18

选择您喜欢的脚本语言。我的是Python。

用那种语言来说,这就是您正在寻找的内容:

import xml.dom.minidom
import glob
from xml.parsers.expat import ExpatError

base_doc = xml.dom.minidom.parseString('<players/>')
doc_element = base_doc.documentElement

for filename in glob.glob("*.xml"):
    f = open( filename )
    x = f.read()
    f.close()
    try:
        player = xml.dom.minidom.parseString(x)
    except ExpatError:
        print "ERROR READING FILE %s" % filename
        continue
    print "Read file %s" % filename
    doc_element.childNodes.insert(-1, player.documentElement.cloneNode(True))

f = open( "all_my_players.xml", "w" )
f.write(doc_element.toxml())
f.close()

Pick your scripting tongue of choice. Mine's Python.

In that language, this is about what you're looking for:

import xml.dom.minidom
import glob
from xml.parsers.expat import ExpatError

base_doc = xml.dom.minidom.parseString('<players/>')
doc_element = base_doc.documentElement

for filename in glob.glob("*.xml"):
    f = open( filename )
    x = f.read()
    f.close()
    try:
        player = xml.dom.minidom.parseString(x)
    except ExpatError:
        print "ERROR READING FILE %s" % filename
        continue
    print "Read file %s" % filename
    doc_element.childNodes.insert(-1, player.documentElement.cloneNode(True))

f = open( "all_my_players.xml", "w" )
f.write(doc_element.toxml())
f.close()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文