如果我使用架构，如何提高 DataSet.ReadXml 的性能？

发布于 2024-07-09 02:44:41 字数 1204 浏览 11 评论 0原文

我有一个 ADO 数据集，我正在通过 ReadXml 从其 XML 文件加载该数据集。数据和架构位于不同的文件中。

目前，加载此数据集需要接近 13 秒的时间。如果我不读取 DataSet 的架构，而只是让 ReadXml 推断该架构，则可以将此时间缩短为 700 毫秒，但生成的 DataSet 不包含任何约束。

我尝试过这样做：

Console.WriteLine("Reading dataset with external schema.");
ds.ReadXmlSchema(xsdPath);
Console.WriteLine("Reading the schema took {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.BeginLoadData();
}
ds.ReadXml(xmlPath);
Console.WriteLine("ReadXml completed after {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.EndLoadData();
}
Console.WriteLine("Process complete at {0} milliseconds.", sw.ElapsedMilliseconds);

当我这样做时，读取模式需要 27 毫秒，读取数据集需要 12000 多毫秒。这是我在所有数据表上调用 EndLoadData 之前报告的时间。

这不是一个巨大的数据量 - 大约 1.5mb，没有嵌套关系，并且所有表都包含两到三列，每列 6-30 个字符。如果我预先阅读架构，我能想到的唯一不同之处是该架构包含所有唯一约束。但 BeginLoadData 应该关闭约束（以及更改通知等）。所以这不应该适用于此。（是的，我尝试过将 EnforceConstraints 设置为 false。）

我读过许多关于人们通过首先读取架构而不是让对象推断架构来改进数据集加载时间的报告。就我而言，推断架构的过程比显式提供架构快大约 20 倍。

这让我有点疯狂。该数据集的架构是根据元信息生成的，我很想编写一个以编程方式创建它的方法，然后使用 XmlReader 将其反序列化。但我宁愿不这样做。

我缺少什么？我还能做些什么来提高这里的速度？

原文

I'm have a ADO DataSet that I'm loading from its XML file via ReadXml. The data and the schema are in separate files.

Right now, it takes close to 13 seconds to load this DataSet. I can cut this to 700 milliseconds if I don't read the DataSet's schema and just let ReadXml infer the schema, but then the resulting DataSet doesn't contain any constraints.

I've tried doing this:

Console.WriteLine("Reading dataset with external schema.");
ds.ReadXmlSchema(xsdPath);
Console.WriteLine("Reading the schema took {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.BeginLoadData();
}
ds.ReadXml(xmlPath);
Console.WriteLine("ReadXml completed after {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.EndLoadData();
}
Console.WriteLine("Process complete at {0} milliseconds.", sw.ElapsedMilliseconds);

When I do this, reading the schema takes 27ms, and reading the DataSet takes 12000+ milliseconds. And that's the time reported before I call EndLoadData on all the DataTables.

This is not an enormous amount of data - it's about 1.5mb, there are no nested relations, and all of the tables contain two or three columns of 6-30 characters. The only thing I can figure that's different if I read the schema up front is that the schema includes all of the unique constraints. But BeginLoadData is supposed to turn constraints off (as well as change notification, etc.). So that shouldn't apply here. (And yes, I've tried just setting EnforceConstraints to false.)

I've read many reports of people improving the load time of DataSets by reading the schema first instead of having the object infer the schema. In my case, inferring the schema makes for a process that's about 20 times faster than having the schema provided explicitly.

This is making me a little crazy. This DataSet's schema is generated off of metainformation, and I'm tempted to write a method that creates it programatically and just deseralizes it with an XmlReader. But I'd much prefer not to.

What am I missing? What else can I do to improve the speed here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

私藏温柔 2024-07-16 02:44:41

我将尝试对文本纯文件和 xml 文件中存储数据进行性能比较。

第一个函数创建两个文件：一个包含 1000000 条纯文本记录的文件和一个包含 1000000 条（相同数据）XML 记录的文件。首先，您必须注意文件大小的差异：~64MB（纯文本）与~102MB（xml 文件）。

void create_files()
    {
        //create text file with data
        StreamWriter sr = new StreamWriter("plain_text.txt");

        for(int i=0;i<1000000;i++)
        {
            sr.WriteLine(i.ToString() + "<SEP>" + "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbb" + i.ToString());
        }

        sr.Flush();
        sr.Close();

        //create xml file with data
        DataSet ds = new DataSet("DS1");

        DataTable dt = new DataTable("T1");

        DataColumn c1 = new DataColumn("c1", typeof(int));
        DataColumn c2 = new DataColumn("c2", typeof(string));

        dt.Columns.Add(c1);
        dt.Columns.Add(c2);

        ds.Tables.Add(dt);

        DataRow dr;

        for(int j=0; j< 1000000; j++)
        {
            dr = dt.NewRow();
            dr[0]=j;
            dr[1] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbb" + j.ToString();
            dt.Rows.Add(dr);
        }

        ds.WriteXml("xml_text.xml");

    }

第二个函数读取这两个文件：首先它将纯文本读取到字典中（只是为了模拟使用它的现实世界），然后读取 XML 文件。这两个步骤均以毫秒为单位进行测量（并将结果写入控制台）：

开始将文本文件读入内存
文本文件在 7628 毫秒内加载到内存
开始将XML文件读入内存
XML 文件加载到内存需要 21018 毫秒

void read_files()
    {

        //timers
        Stopwatch stw = new Stopwatch();
        long milliseconds;

        //read text file in a dictionary

        Debug.WriteLine("Start read Text file into memory");

        stw.Start();
        milliseconds = 0;

        StreamReader sr = new StreamReader("plain_text.txt");
        Dictionary<int, string> dict = new Dictionary<int, string>(1000000);
        string line;
        string[] sep = new string[]{"<SEP>"};
        string [] arValues;
        while (sr.EndOfStream!=true) 
        {
            line = sr.ReadLine();
            arValues = line.Split(sep,StringSplitOptions.None);
            dict.Add(Convert.ToInt32(arValues[0]),arValues[1]);
        }

        stw.Stop();
        milliseconds = stw.ElapsedMilliseconds;

        Debug.WriteLine("Text file loaded into memory in " + milliseconds.ToString() + " milliseconds" );



        //create xml structure
        DataSet ds = new DataSet("DS1");

        DataTable dt = new DataTable("T1");

        DataColumn c1 = new DataColumn("c1", typeof(int));
        DataColumn c2 = new DataColumn("c2", typeof(string));

        dt.Columns.Add(c1);
        dt.Columns.Add(c2);

        ds.Tables.Add(dt);

        //read xml file

        Debug.WriteLine("Start read XML file into memory");

        stw.Restart();
        milliseconds = 0;

        ds.ReadXml("xml_text.xml");

        stw.Stop();
        milliseconds = stw.ElapsedMilliseconds;

        Debug.WriteLine("XML file loaded into memory in " + milliseconds.ToString() + " milliseconds");

    }

结论：XML 文件大小几乎是文本文件大小的两倍，加载速度比文本文件慢三倍。

XML 处理比纯文本更方便（因为抽象级别），但更消耗 CPU/磁盘。

因此，如果您的文件很小并且从性能的角度来看可以接受，那么 XML 数据集就完全可以了。但是，如果您需要性能，我不知道 XML 数据集（使用任何可用的方法）是否比纯文本文件更快。基本上，它从第一个原因开始：XML 文件更大，因为它有更多标签。

I will try to give you a performance comparison between storing data in text plain files and xml files.

The first function creates two files: one file with 1000000 records in plain text and one file with 1000000 (same data) records in xml. First you have to notice the difference in file size: ~64MB(plain text) vs ~102MB (xml file).

void create_files()
    {
        //create text file with data
        StreamWriter sr = new StreamWriter("plain_text.txt");

        for(int i=0;i<1000000;i++)
        {
            sr.WriteLine(i.ToString() + "<SEP>" + "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbb" + i.ToString());
        }

        sr.Flush();
        sr.Close();

        //create xml file with data
        DataSet ds = new DataSet("DS1");

        DataTable dt = new DataTable("T1");

        DataColumn c1 = new DataColumn("c1", typeof(int));
        DataColumn c2 = new DataColumn("c2", typeof(string));

        dt.Columns.Add(c1);
        dt.Columns.Add(c2);

        ds.Tables.Add(dt);

        DataRow dr;

        for(int j=0; j< 1000000; j++)
        {
            dr = dt.NewRow();
            dr[0]=j;
            dr[1] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbb" + j.ToString();
            dt.Rows.Add(dr);
        }

        ds.WriteXml("xml_text.xml");

    }

The second function reads these two files: first it reads the plain text into a dictionary (just to simulate the real world of using it) and after that it reads the XML file. Both steps are measured in milliseconds (and results are written to console):

Start read Text file into memory
Text file loaded into memory in 7628 milliseconds
Start read XML file into memory
XML file loaded into memory in 21018 milliseconds

void read_files()
    {

        //timers
        Stopwatch stw = new Stopwatch();
        long milliseconds;

        //read text file in a dictionary

        Debug.WriteLine("Start read Text file into memory");

        stw.Start();
        milliseconds = 0;

        StreamReader sr = new StreamReader("plain_text.txt");
        Dictionary<int, string> dict = new Dictionary<int, string>(1000000);
        string line;
        string[] sep = new string[]{"<SEP>"};
        string [] arValues;
        while (sr.EndOfStream!=true) 
        {
            line = sr.ReadLine();
            arValues = line.Split(sep,StringSplitOptions.None);
            dict.Add(Convert.ToInt32(arValues[0]),arValues[1]);
        }

        stw.Stop();
        milliseconds = stw.ElapsedMilliseconds;

        Debug.WriteLine("Text file loaded into memory in " + milliseconds.ToString() + " milliseconds" );



        //create xml structure
        DataSet ds = new DataSet("DS1");

        DataTable dt = new DataTable("T1");

        DataColumn c1 = new DataColumn("c1", typeof(int));
        DataColumn c2 = new DataColumn("c2", typeof(string));

        dt.Columns.Add(c1);
        dt.Columns.Add(c2);

        ds.Tables.Add(dt);

        //read xml file

        Debug.WriteLine("Start read XML file into memory");

        stw.Restart();
        milliseconds = 0;

        ds.ReadXml("xml_text.xml");

        stw.Stop();
        milliseconds = stw.ElapsedMilliseconds;

        Debug.WriteLine("XML file loaded into memory in " + milliseconds.ToString() + " milliseconds");

    }

Conclusion: the XML file size is almost double than the text file size and is loaded three times slower than the text file.

XML handling is more convenient (because of the abstraction level) than plain text but it is more CPU/disk consuming.

So, if you have small files and is acceptable from the performance point of view, XML data Sets are more than ok. But, if you need performance, I don't know if XML Data set ( with any kind of method available) is faster that plain text files. And basically, it start from the very first reason: XML file is bigger because it has more tags.

回复收藏 0 原文