创建 1000 个实体框架对象时,何时应该调用 SaveChanges()? (就像导入期间一样)

发布于 2024-08-15 14:08:10 字数 672 浏览 6 评论 0原文

我正在运行一次导入,每次运行都会有 1000 条记录。只是寻找一些对我的假设的确认:

以下哪一个最有意义:

  1. 每次 AddToClassName() 调用运行 SaveChanges()
  2. nAddToClassName() 调用运行 SaveChanges()
  3. 所有次AddToClassName()调用之后运行SaveChanges()

第一个选项可能很慢,对吧?因为它需要分析内存中的 EF 对象、生成 SQL 等。

我认为第二个选项是两全其美的,因为我们可以在 SaveChanges() 调用周围包装一个 try catch ,并且如果其中一条记录失败,一次只会丢失 n 条记录。也许将每个批次存储在 List<> 中。如果 SaveChanges() 调用成功,则删除该列表。如果失败,请记录项目。

最后一个选项可能也会变得非常慢,因为在调用 SaveChanges() 之前,每个 EF 对象都必须位于内存中。如果保存失败,则不会提交任何内容,对吗?

I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:

Which of these makes the most sense:

  1. Run SaveChanges() every AddToClassName() call.
  2. Run SaveChanges() every n number of AddToClassName() calls.
  3. Run SaveChanges() after all of the AddToClassName() calls.

The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.

I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.

The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

与酒说心事 2024-08-22 14:08:11

我构建了一个包含需要在表中创建的所有记录的数组。
其次,我在for循环中使用for循环来保存记录。
一个。 for 循环一次获取 100 条记录,直到达到数组的 count (1000)。
a.1.打开上下文使用
a.2.内部 for 循环一次获取一项并将该项添加到表中
a.3.上下文.SaveChanges();
a.4.关闭使用
b.循环到a。

I build an array of all the records that I need to create in a table.
Second, I use for loop in a for loop to save records.
a. for Loop to get 100 records at a time until it reaches count (1000) of array.
a.1. open using for context
a.2. inner for Loop to take one item at a time and to add item to table then
a.3. context.SaveChanges();
a.4. close using
b. Loop to a.

凡尘雨 2024-08-22 14:08:10

我会先测试一下以确定。性能不一定那么差。

如果需要在一个事务中输入所有行,请在 AddToClassName 类之后调用它。如果可以单独输入行,请在每行后保存更改。数据库一致性很重要。

我不喜欢第二个选择。如果我导入系统并且它会拒绝 1000 行中的 10 行,仅仅因为 1 行是坏的,这会让我感到困惑(从最终用户的角度来看)。可以尝试导入10个,如果失败,则逐个尝试,然后记录。

测试一下是否需要很长时间。不要写“可能”。你还不知道。只有当它确实是一个问题时,才考虑其他解决方案(marc_s)。

编辑

我做了一些测试(时间以毫秒为单位):

10000行:

1行后的SaveChanges():18510,534
100行后的SaveChanges():4350,3075< br>SaveChanges() 10000 行后:5233,0635

50000 行:

SaveChanges() 1 行后:78496,929
500 行后 SaveChanges():22302,2835

50000 行后的 SaveChanges():24022,8765

因此,在 n 行后提交实际上比最终更快。

我的建议是:

  • n 行后的 SaveChanges() 。
  • 如果一次提交失败,请一项一项地尝试以找到错误的行。

测试类别:

表:

CREATE TABLE [dbo].[TestTable](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [SomeInt] [int] NOT NULL,
    [SomeVarchar] [varchar](100) NOT NULL,
    [SomeOtherVarchar] [varchar](50) NOT NULL,
    [SomeOtherInt] [int] NULL,
 CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

类别:

public class TestController : Controller
{
    //
    // GET: /Test/
    private readonly Random _rng = new Random();
    private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private string RandomString(int size)
    {
        var randomSize = _rng.Next(size);

        char[] buffer = new char[randomSize];

        for (int i = 0; i < randomSize; i++)
        {
            buffer[i] = _chars[_rng.Next(_chars.Length)];
        }
        return new string(buffer);
    }


    public ActionResult EFPerformance()
    {
        string result = "";

        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
        TruncateTable();

        return Content(result);
    }

    private void TruncateTable()
    {
        using (var context = new CamelTrapEntities())
        {
            var connection = ((EntityConnection)context.Connection).StoreConnection;
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"TRUNCATE TABLE TestTable";
            command.ExecuteNonQuery();
        }
    }

    private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
    {
        var startDate = DateTime.Now;

        using (var context = new CamelTrapEntities())
        {
            for (int i = 1; i <= noOfRows; ++i)
            {
                var testItem = new TestTable();
                testItem.SomeVarchar = RandomString(100);
                testItem.SomeOtherVarchar = RandomString(50);
                testItem.SomeInt = _rng.Next(10000);
                testItem.SomeOtherInt = _rng.Next(200000);
                context.AddToTestTable(testItem);

                if (i % commitAfterRows == 0) context.SaveChanges();
            }
        }

        var endDate = DateTime.Now;

        return endDate.Subtract(startDate);
    }
}

I would test it first to be sure. Performance doesn't have to be that bad.

If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.

Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.

Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).

EDIT

I've done some tests (time in miliseconds):

10000 rows:

SaveChanges() after 1 row:18510,534
SaveChanges() after 100 rows:4350,3075
SaveChanges() after 10000 rows:5233,0635

50000 rows:

SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835

SaveChanges() after 50000 rows:24022,8765

So it is actually faster to commit after n rows than after all.

My recommendation is to:

  • SaveChanges() after n rows.
  • If one commit fails, try it one by one to find faulty row.

Test classes:

TABLE:

CREATE TABLE [dbo].[TestTable](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [SomeInt] [int] NOT NULL,
    [SomeVarchar] [varchar](100) NOT NULL,
    [SomeOtherVarchar] [varchar](50) NOT NULL,
    [SomeOtherInt] [int] NULL,
 CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Class:

public class TestController : Controller
{
    //
    // GET: /Test/
    private readonly Random _rng = new Random();
    private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private string RandomString(int size)
    {
        var randomSize = _rng.Next(size);

        char[] buffer = new char[randomSize];

        for (int i = 0; i < randomSize; i++)
        {
            buffer[i] = _chars[_rng.Next(_chars.Length)];
        }
        return new string(buffer);
    }


    public ActionResult EFPerformance()
    {
        string result = "";

        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
        TruncateTable();

        return Content(result);
    }

    private void TruncateTable()
    {
        using (var context = new CamelTrapEntities())
        {
            var connection = ((EntityConnection)context.Connection).StoreConnection;
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"TRUNCATE TABLE TestTable";
            command.ExecuteNonQuery();
        }
    }

    private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
    {
        var startDate = DateTime.Now;

        using (var context = new CamelTrapEntities())
        {
            for (int i = 1; i <= noOfRows; ++i)
            {
                var testItem = new TestTable();
                testItem.SomeVarchar = RandomString(100);
                testItem.SomeOtherVarchar = RandomString(50);
                testItem.SomeInt = _rng.Next(10000);
                testItem.SomeOtherInt = _rng.Next(200000);
                context.AddToTestTable(testItem);

                if (i % commitAfterRows == 0) context.SaveChanges();
            }
        }

        var endDate = DateTime.Now;

        return endDate.Subtract(startDate);
    }
}
天荒地未老 2024-08-22 14:08:10

我刚刚在自己的代码中优化了一个非常相似的问题,并想指出一个对我有用的优化。

我发现处理 SaveChanges 的大部分时间,无论是一次处理 100 条记录还是 1000 条记录,都受 CPU 限制。因此,通过使用生产者/消费者模式(使用 BlockingCollection 实现)处理上下文,我能够更好地利用 CPU 核心,并从每秒 4000 次更改(如 SaveChanges 的返回值报告)中获得每秒超过 14,000 次更改。 CPU 利用率从大约 13%(我有 8 个核心)上升到大约 60%。即使使用多个消费者线程,我也几乎不会对(非常快的)磁盘 IO 系统造成负担,并且 SQL Server 的 CPU 利用率也不高于 15%。

通过将保存卸载到多个线程,您可以调整提交之前的记录数和执行提交操作的线程数。

我发现创建 1 个生产者线程和(CPU 核心数)-1 个消费者线程允许我调整每批提交的记录数,以便 BlockingCollection 中的项目数在 0 和 1 之间波动(在消费者线程获取 1 个数据之后)物品)。这样,就有足够的工作让消耗线程以最佳方式工作。

当然,这种场景需要为每个批次创建一个新的上下文,我发现即使在我的用例的单线程场景中,这也会更快。

I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.

I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.

By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.

I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.

This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.

自此以后,行同陌路 2024-08-22 14:08:10

如果您需要导入数千条记录,我会使用 SqlBulkCopy 之类的东西,而不是实体框架。

If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.

伏妖词 2024-08-22 14:08:10

使用存储过程。

  1. 在 Sql Server 中创建用户定义的数据类型。
  2. 在代码中创建并填充这种类型的数组(非常快)。
  3. 通过一次调用将数组传递给存储过程(非常快)。

我相信这将是最简单、最快的方法。

Use a stored procedure.

  1. Create a User-Defined Data Type in Sql Server.
  2. Create and populate an array of this type in your code (very fast).
  3. Pass the array to your stored procedure with one call (very fast).

I believe this would be the easiest and fastest way to do this.

甜中书 2024-08-22 14:08:10

抱歉,我知道这个帖子已经过时了,但我认为这可以帮助其他人解决这个问题。

我遇到了同样的问题,但是可以在提交更改之前验证更改。我的代码看起来像这样并且工作正常。使用chUser.LastUpdated我检查它是一个新条目还是只是一个更改。因为不可能重新加载尚未存在于数据库中的条目。

// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
    // Delete invalid User or Change
    var chUser  =  (db_User) ch.Entry.Entity;
    if (chUser.LastUpdated == null)
    {
        // Invalid, new User
        _userDatabase.db_User.Remove(chUser);
        Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
    }
    else
    {
        // Invalid Change of an Entry
        _userDatabase.Entry(chUser).Reload();
        Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
    }                    
}

_userDatabase.SaveChanges();

Sorry, I know this thread is old, but I think this could help other people with this problem.

I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.

// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
    // Delete invalid User or Change
    var chUser  =  (db_User) ch.Entry.Entity;
    if (chUser.LastUpdated == null)
    {
        // Invalid, new User
        _userDatabase.db_User.Remove(chUser);
        Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
    }
    else
    {
        // Invalid Change of an Entry
        _userDatabase.Entry(chUser).Reload();
        Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
    }                    
}

_userDatabase.SaveChanges();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文