如何使用EF core提高SQL读取性能

发布于 2025-01-12 13:10:35 字数 3930 浏览 1 评论 0原文

查询时不时会抛出超时异常。 SqlCommand.TimeOut 默认设置为 30 秒。

Microsoft.Data.SqlClient.SqlException (0x80131904)：执行超时已到期。超时时间在完成之前已过操作或服务器没有响应。
System.ComponentModel.Win32Exception (258): 等待操作计时出来。
在 Microsoft.Data.SqlClient.SqlCommand.<>c.b__169_0（任务1 结果）在 System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke()
在 System.Threading.ExecutionContext.RunInternal（ExecutionContextexecutionContext，ContextCallback 回调，对象状态）
--- 先前位置的堆栈跟踪结束 ---
在 System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, 线程 threadPoolThread)
--- 先前位置的堆栈跟踪结束 ---
在Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync（RelationalCommandParameterObject参数对象，CancellationToken取消令牌）
在Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync（RelationalCommandParameterObject参数对象，CancellationToken取消令牌）
在 Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable1.AsyncEnumerator.InitializeReaderAsync（DbContext _，布尔结果，CancellationToken CancelationToken）在 Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState，TResult]（TState 状态，Func4操作， Func4 verifySucceeded、CancellationToken CancellationToken) 在 Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable1.AsyncEnumerator.MoveNextAsync() ClientConnectionId:2daf9373-898b-481d-a7ef-8688c1e1efed 错误数量：-2，状态：0，类别：11

根据 docs SqlCommandTimeout 是

此属性是累积超时（对于所有网络数据包）在方法调用期间读取）所有网络读取在命令执行或结果处理期间。超时可以返回第一行后仍然发生，并且不包括用户处理时间，仅网络读取时间。
例如，如果读取需要两个网络，则超时时间为 30 秒数据包，那么它有 30 秒的时间来读取这两个网络数据包。如果你再次调用 Read ，它将再有 30 秒的时间来读取任何数据它需要。

我已经多次验证了索引。 order 表确实有数百万 (6M+) 条记录。接收此异常的代码如下所示。

    public async Task<Order> GetByID(int orderID)
    {
        var start = DateTime.UtcNow;
        var wo = await _dbContext.Orders
            .Where(x => x.OrderID == orderID)                
            .Select(x => new WorkOrder()
            {
                OrderID = x.OrderID,
                ItemID = x.ItemID,
                ItemName = x.ItemName,
                OrderStatusID = x.OrderStatusID,
                OrderStatus = x.OrderStatus,
                Template = new Template()
                {
                    TemplateModel = x.Template.TemplateModel
                },
                Responses = x.Responses.Select(r => new Response()
                {
                    ResponseID = r.ResponseID,
                    ResponseStatusID = r.ResponseStatusID,
                    ResponseStatus = r.ResponseStatus,
                    //some more property assignments
                }).ToList()
            })
            .SingleOrDefaultAsync()
            .ConfigureAwait(false);

        var end = DateTime.UtcNow;
        
        _logger.LogInformation($"EF-{end.Substract(start}");

        return wo;
    }

这是单次执行的两个日志。 （请注意，这不是抛出超时异常的实例。）

EF 日志

Executed DbCommand ("1"ms) [Parameters=["@__orderID_0='?' (DbType = Int32)"], CommandType='Text', CommandTimeout='30']"  ""SELECT [t].[OrderID],

我的自定义日志显示总时间

EF-00:00:00.2109107

。在此特定实例中，执行 SQL 只需要 1 毫秒，完成 EF 调用只需要 21 毫秒。所以从日志中我注意到执行SQL的实际时间非常少，并且读取结果和构建结果集需要更长的时间。

问题

我还可以检查和尝试哪些其他内容？（Web Server 和 SQL Server 是两个不同的 EC2 服务器）
有没有办法提高 SQL 读取和结果集构建性能？
EF 可以生成带有 READONLY 子句的查询吗？

更新1
我还尝试过 EF Core 的 .AsSplitQuery() 选项，但性能没有或较低。此加载集合是单独的数据库查询。

原文

Every now and then a query throws time out exception. The SqlCommand.TimeOut is set to default 30 seconds.

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution timeout
expired. The timeout period elapsed prior to completion of the
operation or the server is not responding.
System.ComponentModel.Win32Exception (258): The wait operation timed
out.
at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__169_0(Task1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&
currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable1.AsyncEnumerator.InitializeReaderAsync(DbContext _, Boolean result, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func4 operation, Func4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable1.AsyncEnumerator.MoveNextAsync()
ClientConnectionId:2daf9373-898b-481d-a7ef-8688c1e1efed Error
Number:-2,State:0,Class:11

As per docs SqlCommandTimeout is

This property is the cumulative time-out (for all network packets that
are read during the invocation of a method) for all network reads
during command execution or processing of the results. A time-out can
still occur after the first row is returned, and does not include user
processing time, only network read time.
For example, with a 30 second time out, if Read requires two network
packets, then it has 30 seconds to read both network packets. If you
call Read again, it will have another 30 seconds to read any data that
it requires.

I have already verified the indexes multiple times. The order table do have records in millions (6M+). The code that receives this exception looks like this.

    public async Task<Order> GetByID(int orderID)
    {
        var start = DateTime.UtcNow;
        var wo = await _dbContext.Orders
            .Where(x => x.OrderID == orderID)                
            .Select(x => new WorkOrder()
            {
                OrderID = x.OrderID,
                ItemID = x.ItemID,
                ItemName = x.ItemName,
                OrderStatusID = x.OrderStatusID,
                OrderStatus = x.OrderStatus,
                Template = new Template()
                {
                    TemplateModel = x.Template.TemplateModel
                },
                Responses = x.Responses.Select(r => new Response()
                {
                    ResponseID = r.ResponseID,
                    ResponseStatusID = r.ResponseStatusID,
                    ResponseStatus = r.ResponseStatus,
                    //some more property assignments
                }).ToList()
            })
            .SingleOrDefaultAsync()
            .ConfigureAwait(false);

        var end = DateTime.UtcNow;
        
        _logger.LogInformation(quot;EF-{end.Substract(start}");

        return wo;
    }

Here are the two logs for a single execution. (Note this is not a instance where it throws time out exception.)

EF Log

Executed DbCommand ("1"ms) [Parameters=["@__orderID_0='?' (DbType = Int32)"], CommandType='Text', CommandTimeout='30']"  ""SELECT [t].[OrderID],

My custom log shows total time

EF-00:00:00.2109107

In this particular instance it take only 1 ms to execute SQL and 21 ms for complete EF call. So from the logs I noticed the actual time to execute the SQL is very less and it takes longer to read the result and construct the result set.

Questions

What are the other things I can check and try? (Web Server and SQL Server are two different EC2 servers)
Is there a way to improve SQL read and result set building performance?
Can EF generate query with READONLY clause?

UPDATE 1
I have also tried EF Core's .AsSplitQuery() options with no or lower performance. This loads collection is separate db query.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吻风 2025-01-19 13:10:35

首先要检查的是捕获发送到服务器的确切 SQL。我的猜测是，您遇到问题的查询将返回结果列表。使用探查器运行该查询并查看所花费的时间、数据库读取数量等。查看执行计划并查看索引利用率。使用 ORM 进行高效开发与通过视图和存储过程进行的传统数据访问有些不同。通常，您需要构建 Linq 查询和投影，并针对数据库运行它们以检查索引利用率等内容，同时考虑类似现实世界的使用模式，以优先考虑从索引中获得的最佳“效果”，并识别较繁重的操作就像您可能希望采用排队机制之类的报告来防止一次性启动太多昂贵的查询来“接触”大量数据，从而触发锁延迟和死锁。

您的特定问题的解决方案很可能通过查看正在运行的查询来揭示。

围绕查询性能需要考虑和研究的一些一般事项：

当使用 EF 和投影来查看模型/DTO 时，第一步是将视图模型降至所需的最少数据。如果您可以从该结果中扩展一个选择，请考虑使用异步 Ajax 调用来按需加载该扩展。我要注意的警告标志是在加载数据集合时看到返回的那些项目包含子数据集合。对于搜索结果等内容，应该不惜一切代价避免这种情况。用户可能检查其中一些相关集合，但这对于可能永远不会被看到的东西来说是巨大的开销。

如果返回的数据可能超过合理的屏幕范围（即 25-100 行），请始终确保对结果进行分页。很多时候，我在开发过程中看到样本集很小的测试，却发现一旦系统在生产中投入使用，它就会停止运行，因为用户可以通过搜索返回整个数据源的 25%。还要在所有搜索条件中采取措施，以确保搜索不会以无用的条件开始。（即搜索字符串的最小长度）

The first thing to check would be to capture the exact SQL being sent to the server. My guess is that the queries you are having issues with will be getting back Lists of results. Run that query with a profiler and look at the time taken, # of DB Reads, etc. Look at the execution plan and look at index utilization. Developing with an ORM efficiently is a bit different to traditional data access via views and stored procedures. Typically you will want to build your Linq queries and projections and run them against a database to inspect things like index utilization while considering real-world-like usage patterns to prioritize where the best "bang" you can get from indexing, and identifying heavier operations like reports where you might want to employ something like a queuing mechanism to prevent too many expensive queries that "touch" a lot of data from being kicked off all at once triggering lock delays and deadlocks.

The solution to your particular problem will most likely be revealed by looking at the query(ies) being run.

Some general things to consider and look into around query performance:

When working with EF and projection to view models / DTOs, the first step is to get the view models down to the bare minimum data needed. If you can expand a selection from that result, consider an async Ajax call to load that one expansion on demand. Warning signs I look out for is when loading collections of data seeing those items returned containing child collections of data. For something like search results etc. this is something that should be avoided at all costs. Users might inspect a few of those related collections, but that is a huge overhead for something that might never be seen.

Always ensure that results are paginated if there is ever a possibility of more than a reasonable screen full of data (I.e. 25-100 rows) can be returned. Many times I see tests in development where the sample set is tiny, only to find that once the system is set loose in production it grinds to a halt because users have the ability to return 25% of the entire data source via a search. Also employ measures in all search criteria to ensure searches aren't kicked off with useless criteria. (I.e. minimum lengths on search strings)

回复收藏 0 原文

~没有更多了~