SQL 事务的实际成本是多少？

发布于 2024-09-27 04:45:30 字数 691 浏览 3 评论 0原文

这需要一些背景知识。我正在创建一个 Web 应用程序（带有 SQL Server 后端的 ASP.NET/VB.NET），它将接受申请以接收资金，并且此表单包含三个人的联系信息字段。由于三个人的信息相同（姓名、电子邮件、电话），因此我选择将联系信息存储在与应用程序分开的表中。在应用程序表中，这三个联系人中的每一个都有一个 ID 字段，该字段指向联系人表中的联系人。

然而，这在做我的 CRUD 工作时提出了一个有趣的问题。我能想到的在一个 SQL 事务中创建、更新和检索应用程序和所有三个联系人信息的唯一方法需要非常复杂的存储过程。换句话说，（对我来说）通过多个交易检索此信息会简单得多。但是，由于永远不需要独立地使用此信息，因此我总是会进行多个事务来获取一个应用程序的信息。

所以我的问题是：

这种设计是否太过分了？每个申请的联系人永远不会超过三个，而且必须正好是三个。我将此信息删除到单独的表中是否过于复杂？
与编写复杂的存储过程并且只需要一个事务相比，执行多个 SQL 事务的实际成本是多少？
一般来说，使用 ADO.NET 的 Web 应用程序的 SQL 事务的成本是多少？

感谢您耐心等待冗长的解释。

*编辑*

在阅读了您的一些回复后，我似乎错误地使用了“交易”一词。我实际上好奇的是通过单个连接执行多个查询与执行一个查询的成本。抱歉造成误解。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里人 2024-10-04 04:45:31

需要事务将数据库从一种一致状态移动到另一种一致状态。这里的“一致”适用于数据库的应用程序视图。典型的例子是两个账户之间的转账：您必须借记一个账户并贷记另一个账户。在这两次操作之间，数据库不一致，有一些钱“消失”了（从一个帐户中扣除的金额无处）。但在事务结束时，数据库再次保持一致。您需要一个事务来跨越这两个操作，以防止读者查看不一致的数据库，并在崩溃时确保数据库的一致性。

您说，为了在单个事务中处理逻辑中的多个项目，您需要复杂的过程。事实并非如此，事务范围与请求范围正交：客户端可以启动事务，通过在 3 个调用中调用 3 个过程来执行 3 个操作，然后提交事务。不要求所有操作都在一个存储过程中完成。

因此事务不应导致显着的过程开销。事实上，程序会忽略事务。一个编写良好的过程在事务内部调用以及在事务之外调用时都应该表现正确。请参阅异常处理和嵌套事务获取模板对于在存在事务和异常的情况下正常运行的过程。

最后，交易的真实成本是多少？事务写入数据（读取并不真正需要事务），因此它们锁定修改的行。当尝试读取这些被阻塞的行时，读取通常会被阻塞，然后事务越长，它锁定行的时间就越长，并且阻塞的读取就越多。但有一个非常简单的解决方案：快照读取。快照读取确实是神奇的仙尘，它们允许应用程序不受阻塞行的阻碍，因为读取器始终可以读取阻塞更新之前的行版本行。请参阅使用基于行版本控制的隔离级别。

所以结论很简单：交易没有成本。仅仅是因为除了交易之外别无选择。不是“X 比 Y 慢”的问题，而是“X 是唯一正确的选择”的问题。

更新

编辑后：多个请求与一个请求的成本可能相当大。到服务器的往返（即在开放连接上发出请求）具有固定成本。如果您进行多次往返，则您需要为每个请求支付固定费用。如果您通过多个过程调用执行一个请求，则此固定成本只需支付一次。在非常的系统上，这种差异是可以测量的，并且会影响整体性能。但我谈论的是非常的系统，比如每秒数千个请求。解决方案通常不是使过程复杂化，而是在一个请求中发出多个过程调用：

SqlCommand cmd = new SqlCommand(@"
exec usp_firstProc @arg1, @arg2;
exec usp_secondProc @arg2, @arg3;
exec usp_thirdProc @arg1, @arg3;", connection);
cmd.Parameters.Add("@arg1", someValue);
...
cmd.ExecuteNonQuery();

但我必须说，除非您的工作负载非常热，否则实际上并不需要此解决方案。作为一个粗略的规则，对于任何低于 1000 个请求/秒的请求，我会考虑清晰代码的好处超过性能好处。

但如果您必须为每个请求打开一个新连接，则情况会有所不同。登录握手确实非常昂贵，可以以数百毫秒为单位来衡量。但解决方案很简单：使用连接池（在 ADO.Net 中默认启用），并且不要过早丢弃应用程序中的连接，保留连接并重用它，直到整个工作单元完成。

Transactions are needed to move the database from one consistent state into another consistent state. The 'consistent' here applies to the application view of the database. The typical example is the money transfer between two accounts: you have to debit one account and credit another account. In between these two operations the database is inconsistent, there is some money that have 'vanished' (the sum debited from one account is nowhere). But at the end of the transaction, the database is again consistent. You need a transaction to span these two operations in order to protect the readers from viewing an inconsistent database, and in order to ensure a consistent database in case of a crash.

You say that in order to process multiple items in your logic in a single transaction you require complicated procedures. That is not true, the transaction scope is orthogonal to the request scope: a client can start a transaction, do 3 operations by invoking 3 procedures in 3 calls, then commit the transaction. It is not required that all the operations be done in one single stored procedure.

So transaction should not induce significant procedure overhead. In fact, a procedure would be oblivious to transactions. A well written procedure should behave correctly when it's invoked inside of a transaction as well as when it's invoked w/o a transaction. See Exception handling and nested transactions for a template for procedures that behave correctly in presence of transactions and exceptions.

An finally, what is the real cost of a transaction? Transaction write data (read aren't really needing transactions) and as such they lock the rows modified. Reads normally block when attempting to read these blocked rows, and then the longer a transaction is, the longer it locks rows, and the more reads it blocks. But there is a really simple solution: snapshot reads. Snapshot reads are really a magic pixie dust, they allow applications to go ahead unhindered by blocked rows, because a reader can always read the row version prior to the update that is blocking the row. See Using Row Versioning-based Isolation Levels.

So the conclusion is simple: transactions have no cost. Simply because there is no alternative to transactions. Is not an issue of 'X is slower that Y', is an issue of 'X is the only correct alternative'.

Updated

After your edit: the cost oh having multiple requests vs. one requests can be significant. A round-trip to the server (ie. issuing a request on an open connection) has a fixed cost. If you do multiple round-trips then you pay this fixed cost on each request. If you do one single request with multiple procedure invocations then this fixed cost is only payed once. On very hot systems, this difference is measurable and has a cost on overall performance. But I'm talking about really hot systems, as in thousands of requests per second. The solution is usually not to complicate the procedures, but issue multiple procedure calls in one single request:

SqlCommand cmd = new SqlCommand(@"
exec usp_firstProc @arg1, @arg2;
exec usp_secondProc @arg2, @arg3;
exec usp_thirdProc @arg1, @arg3;", connection);
cmd.Parameters.Add("@arg1", someValue);
...
cmd.ExecuteNonQuery();

But I must say that this solution is not really required unless you have a really hot workload. As a back-of-the-envelop rule, for anything under 1000 requests/sec, I would consider the benefits of clear code to outweigh the performance benefits.

One thing different though if you have to open a new connection for each request. The login handshake is really expensive, can be measured in hundreds of ms. But the solution is trivial: use connection pooling (which is enabled by default in ADO.Net) and don't discard connections in the app prematurely, keep the connection and reuse it until the entire unit of work is finished.

回复收藏 0 原文

我是有多爱你 2024-10-04 04:45:31

这并不过分。这是一个很好的规范化，也是“不要重复自己”原则的一个很好的应用。系统的要求可能会发生变化，您将来可能需要 2 或 4 个联系人，而不是 3 个。
您不需要执行任何复杂的存储过程。你在这里没有做任何不寻常的事情。您根本不需要存储过程。
不知道。

当你说“交易”时，我认为你的意思只是“查询”。当我想要获取一个应用程序的数据时，我会使用两个单独的简单查询：

SELECT *
  FROM application

这

SELECT *
  FROM contact

是两个单独的查询，但很重要。更重要的部分是以有意义的方式设计数据库。

This is not overkill. This is a good normalization and a good application of the Don't-Repeat-Yourself principle. The requirements of the system may change and you may need 2 or 4 contacts in the future instead of 3.
You don't need to do any complex stored procedures. You're not doing anything out of the ordinary here. You don't need stored procedures at all.
Don't know.

When you say "transaction" I think you just mean "query." When I wanted to get the data for one application, I would use two separate, simple queries:

SELECT *
  FROM application

and

SELECT *
  FROM contact

It's two separate queries but big deal. The more important part is designing the database in a way that makes sense.

回复收藏 0 原文

猥︴琐丶欲为 2024-10-04 04:45:31

您在这里使用“交易”一词有点不正确。我认为你指的是一个简单的数据库查询，ACID事务助手。

话虽如此，这很难说。默认情况下启用的连接池将无需打开三/四个单独的连接。

您甚至可能不需要多个查询来获取数据，并且可以连接两个表来检索它，例如：

SELECT app.field1, app,field2, contact1.name, contact2.name, contact3.name
FROM Applications AS app
INNER JOIN Contacts AS contact1 ON app.contact1ID = contact1.id
INNER JOIN Contacts AS contact2 ON app.contact2ID = contact2.id
INNER JOIN Contacts AS contact3 ON app.contact3ID = contact3.id

或类似的东西，考虑到您的情况（从应用程序向选择列表添加新字段，从每个表返回更多数据）如果表索引正确，则查找速度非常快，并且可能根本不会影响性能。如果

您要插入/更新数据，那么是的，您将需要多个查询来执行此操作。但是数据的选择可以发生在一个表中，

当然，另一种选择是将数据放回到原始应用程序表中，从而首先不需要第二个表，此时不需要额外的连接或。还需要其他任何东西。

You're using the term 'transaction' a bit incorrectly here. I think what you're referring to is a simple database query, ACID transactions aide.

That said, it's difficult to say. Connection pooling, enabled by default, will prevent the need to open three/four separate connections.

You'd likely not even need multiple queries to get the data, and can join the two tables to retrieve it, such as:

SELECT app.field1, app,field2, contact1.name, contact2.name, contact3.name
FROM Applications AS app
INNER JOIN Contacts AS contact1 ON app.contact1ID = contact1.id
INNER JOIN Contacts AS contact2 ON app.contact2ID = contact2.id
INNER JOIN Contacts AS contact3 ON app.contact3ID = contact3.id

or something similar, given your situation (adding new fields to the select list from app, returning more data from each contact table, etc. If the tables are indexed properly, the lookup are very quick, and will likely not impact performance at all.

If you're inserting/updating the data, then yes, you'll need multiple queries to do that, but the selecting of the data can occur in one.

Of course, the other option would be to put the data back into the original application table, removing the need for the second table in the first place. At which point, no extra joins or anything else is needed.

回复收藏 0 原文

写给空气的情书 2024-10-04 04:45:31

给你几个想法...

如果您当前的设计是两个表，第一个表（应用程序）作为

ID，contact_id1，contact_id2，contact_id3，otherinfo

第二个表作为联系人只是

contact_id ， info

这种格式的最大优点是如果您有 1 个联系人这是在多个应用程序上...联系人“Bob”在 8 个不同的应用程序上是 contact_1 意味着 Bob 的信息仅记录一次，并且仅将他的 ID 记录在应用程序表上。最大的限制是每个应用程序有 3 个联系人被永久编码

或者..应用程序可以定义为
application_ID，信息
并联系为
application_ID, id_sequence, info

此设置的优点是可以为每个应用程序存储无限数量的联系人。

虽然一切都可以满足您的需求...如果您总是有 3 个联系人，那么这里的第二个选项有点不必要。如果您不打算拥有超过 1000 条记录，那么将所有信息存储在一个表中并不是最糟糕的主意（选项一主要消除冗余数据）。回答你的问题1...在某些情况下这是多余的，但答案将取决于你当前的要求和未来的要求。从“最佳实践”设计来看，2 个表设置是优先的。

只是对问题#2 的评论 - 这将是一个更简单的存储过程...我看不出它对您的设置太复杂。您可能可以创建一个可更新的视图（但不确定那里可能会出现什么限制或问题）。