使用 TPL 的生产者消费者模型,.net 4.0 中的任务

发布于 2024-12-10 09:51:36 字数 1633 浏览 0 评论 0原文

我有一个相当大的 XML 文件(大约 1-2GB)。

要求是将 xml 数据持久保存到数据库中。 目前,这是通过 3 个步骤实现的。

  1. 尽可能以更少的内存占用读取大文件
  2. 从 xml 数据创建实体
  3. 使用 SqlBulkCopy 将创建的实体中的数据存储到数据库中。

为了获得更好的性能,我想创建一个生产者-消费者模型,其中生产者创建一组实体(例如一批 10K)并将其添加到队列中。消费者应该从队列中取出一批实体并使用 sqlbulkcopy 持久化到数据库。

谢谢, 戈库尔

void Main()
{
    int iCount = 0;
    string fileName = @"C:\Data\CatalogIndex.xml";

    DateTime startTime = DateTime.Now;
    Console.WriteLine("Start Time: {0}", startTime);
    FileInfo fi = new FileInfo(fileName);
    Console.WriteLine("File Size:{0} MB", fi.Length / 1048576.0);

/* I want to change this loop to create a producer consumer pattern here to process the data parallel-ly
*/
     foreach (var element in StreamElements(fileName,"title"))
            {
                iCount++;
            }

            Console.WriteLine("Count: {0}", iCount);
            Console.WriteLine("End Time: {0}, Time Taken:{1}", DateTime.Now, DateTime.Now - startTime);
        }

    private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
    { 
        using (var rdr = XmlReader.Create(fileName))
        {
            rdr.MoveToContent();
            while (!rdr.EOF)
            {
                if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
                {
                    var e = XElement.ReadFrom(rdr) as XElement;
                    yield return e;
                }
                else
                {
                    rdr.Read();
                }
            }
            rdr.Close();
        }
    }

I have a fairly large XML file(around 1-2GB).

The requirement is to persist the xml data in to database.
Currently this is achieved in 3 steps.

  1. Read the large file with less memory foot print as much as possible
  2. Create entities from the xml-data
  3. Store the data from the created entities in to the database using SqlBulkCopy.

To achieve better performance I want to create a Producer-consumer model where the producer creates a set of entities say a batch of 10K and adds it to a Queue. And the consumer should take the batch of entities from the queue and persist to the database using sqlbulkcopy.

Thanks,
Gokul

void Main()
{
    int iCount = 0;
    string fileName = @"C:\Data\CatalogIndex.xml";

    DateTime startTime = DateTime.Now;
    Console.WriteLine("Start Time: {0}", startTime);
    FileInfo fi = new FileInfo(fileName);
    Console.WriteLine("File Size:{0} MB", fi.Length / 1048576.0);

/* I want to change this loop to create a producer consumer pattern here to process the data parallel-ly
*/
     foreach (var element in StreamElements(fileName,"title"))
            {
                iCount++;
            }

            Console.WriteLine("Count: {0}", iCount);
            Console.WriteLine("End Time: {0}, Time Taken:{1}", DateTime.Now, DateTime.Now - startTime);
        }

    private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
    { 
        using (var rdr = XmlReader.Create(fileName))
        {
            rdr.MoveToContent();
            while (!rdr.EOF)
            {
                if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
                {
                    var e = XElement.ReadFrom(rdr) as XElement;
                    yield return e;
                }
                else
                {
                    rdr.Read();
                }
            }
            rdr.Close();
        }
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

尝蛊 2024-12-17 09:51:36

这就是你想做的吗?

    void Main()
    {
        const int inputCollectionBufferSize = 1024;
        const int bulkInsertBufferCapacity = 100;
        const int bulkInsertConcurrency = 4;

        BlockingCollection<object> inputCollection = new BlockingCollection<object>(inputCollectionBufferSize);

        Task loadTask = Task.Factory.StartNew(() =>
        {
            foreach (object nextItem in ReadAllElements(...))
            {
                // this will potentially block if there are already enough items
                inputCollection.Add(nextItem);
            }

            // mark this collection as done
            inputCollection.CompleteAdding();
        });

        Action parseAction = () =>
        {
            List<object> bulkInsertBuffer = new List<object>(bulkInsertBufferCapacity);

            foreach (object nextItem in inputCollection.GetConsumingEnumerable())
            {
                if (bulkInsertBuffer.Length == bulkInsertBufferCapacity)
                {
                    CommitBuffer(bulkInsertBuffer);
                    bulkInsertBuffer.Clear();
                }

                bulkInsertBuffer.Add(nextItem);
            }
        };

        List<Task> parseTasks = new List<Task>(bulkInsertConcurrency);

        for (int i = 0; i < bulkInsertConcurrency; i++)
        {
            parseTasks.Add(Task.Factory.StartNew(parseAction));
        }

        // wait before exiting
        loadTask.Wait();
        Task.WaitAll(parseTasks.ToArray());
    }

Is this what you are trying to do?

    void Main()
    {
        const int inputCollectionBufferSize = 1024;
        const int bulkInsertBufferCapacity = 100;
        const int bulkInsertConcurrency = 4;

        BlockingCollection<object> inputCollection = new BlockingCollection<object>(inputCollectionBufferSize);

        Task loadTask = Task.Factory.StartNew(() =>
        {
            foreach (object nextItem in ReadAllElements(...))
            {
                // this will potentially block if there are already enough items
                inputCollection.Add(nextItem);
            }

            // mark this collection as done
            inputCollection.CompleteAdding();
        });

        Action parseAction = () =>
        {
            List<object> bulkInsertBuffer = new List<object>(bulkInsertBufferCapacity);

            foreach (object nextItem in inputCollection.GetConsumingEnumerable())
            {
                if (bulkInsertBuffer.Length == bulkInsertBufferCapacity)
                {
                    CommitBuffer(bulkInsertBuffer);
                    bulkInsertBuffer.Clear();
                }

                bulkInsertBuffer.Add(nextItem);
            }
        };

        List<Task> parseTasks = new List<Task>(bulkInsertConcurrency);

        for (int i = 0; i < bulkInsertConcurrency; i++)
        {
            parseTasks.Add(Task.Factory.StartNew(parseAction));
        }

        // wait before exiting
        loadTask.Wait();
        Task.WaitAll(parseTasks.ToArray());
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文