存储过程与 .net 应用程序中的复杂处理
我们正在使用 SQL Server 数据库在 .net 3.5 中构建一个新应用程序。 该数据库相当大,有大约 60 个带有数据负载的表。 .net 应用程序具有将数据从数据输入和第三方系统引入该数据库的功能。
当数据库中的所有数据可用后,系统必须进行大量计算。 计算逻辑相当复杂。 计算所需的所有数据都在数据库中,输出也需要存储在数据库中。 数据收集每周都会进行,并且需要每周进行计算以生成所需的报告。
由于上述情况,我想使用存储过程进行所有这些计算。 问题是我们还需要数据独立性,而存储过程无法为我们提供这一点。 但如果我一直在.net中通过查询数据库来完成所有这些工作,我认为它无法快速完成工作。
例如,我需要查询一个表,该表将返回 2000 行,然后对于每一行,我需要查询另一个表,该表将返回 300 个结果,而对于该表的每一行,我需要查询多个表(大约 10 个)才能获得所需的结果数据,进行计算并将输出存储在另一个表中。
现在我的问题是我应该继续使用存储过程解决方案并忘记数据库独立性,因为性能很重要。 我还认为如果我们使用存储过程解决方案,开发时间会少得多。 如果任何客户希望在 Oracle 数据库上使用此解决方案(因为他们不想维护另一个数据库),那么我们将存储过程移植到 Oracle 数据库并维护两个版本以供将来的更改/增强。 同样,其他客户可能会要求其他数据库。
我上面提到的 2000 行是产品 sku。 我提到的 300 行具有我们要计算的不同属性,例如处理成本、运输成本等。我提到的 10 个表包含有关货币换算、单位换算、网络、区域、公司、销售价格、每售出数量的信息天等。生成的表将所有信息存储为星型模式,以用于分析和报告目的。 目标是获得有关产品的任何详细信息,以便人们知道产品销售的哪些属性导致我们花钱以及我们可以在哪里进行改进。
We are building a new application in .net 3.5 with SQL server database. The database is fairly large having around 60 tables with loads on data. The .net application have functionality to bring data into this database from data entry and from third party systems.
After all the data is available in database the system have to do lots of calculation. The calculation logic is pretty complex. All the data required for calculations is in database and the output also needs to be stored in database. The data gathering will happen every week and the calculation needs to be done every week to generate required reports.
Due to above scenario I was thinking do all these calculations using Stored Procedure. The problem is we need data independence also and stored procedure will not be able to provide us that. But if I do all this in .net by query database all the time, I don't think it will be able to finish the work quickly.
For example, I need to query one table which will return me 2000 rows then for each row I need to query another table which will return me 300 results than for each row of this I need to query multiple tables (around 10) to get required data, do the calculation and store the output in another table.
Now my question should I go ahead with stored-procedure solution and forget about database independence since performance is important. I also think development time will be much less if we use stored procedure solution. If any of client want this solution on say oracle database (because they don't want to maintain another database) then we port the stored procedures to oracle database and maintain two versions for any future changes/enhancements. Similarly other clients may ask for other databases.
The 2000 rows which I mentioned above is of product skus. The 300 rows I mentioned is of different attributes which we want to calculate, e.g. handling cost, transport cost, etc. The 10 tables I mentioned have information about currency conversion, unit conversion, network, area, company, sell price, number sold per day, etc. The resulting table stores all the information as a star schema for analysis and reporting purpose. The goal is to get any minute information about the product so that one know what attribute of a product selling is costing us money and where we can do the improvement.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不会考虑在数据库以外的任何地方进行数据操作。
大多数人尝试使用循环算法来处理数据库数据。 如果您需要真正的速度,请将您的数据视为一组行,并且您可以在一次更新中更新数千行。 我将新手程序员编写的许多游标循环重写为单个更新语句,其中执行时间得到了极大的改善。
你说:
看起来您没有使用联接,并且您已经在循环中思考。 即使您确实打算循环,最好编写一个查询来加入所有必要的数据,然后对其进行循环。 请记住,更新和插入语句可能有非常复杂的查询来驱动它们。 包含在 CASE 语句、派生表、条件联接 (LEFT OUTER JOIN) 中,您几乎可以在一次更新/插入中解决任何问题。
I wouldn't consider doing the data manipulation anywhere other than in the database.
most people try to work with database data using looping algorithms. if you need real speed, think of your data as a SET of rows and you can update thousands of rows within a single update. I have rewritten so many cursor loops written by novice programmers into single update statements where the execution time was massively improved.
you say:
from your question it looks like you are not using joins, and you are already thinking in loops. even if you do intend to loop, it is much better to write a query to join in all data necessary then loop over it. remember update and insert statements can have massively complex queries driving them. include in CASE statements, derived tables, conditional joins (LEFT OUTER JOIN) and you can just about solve any problem in a single update/insert.
好吧,没有任何关于这些表中的数据的具体细节,只是餐巾纸背面的计算表明您正在讨论在您提供的示例中处理超过 600 万行的信息(2,000 行 * 300 行 *(1 行) * 10 张桌子))。
所有这些行都是不同的,还是这 10 个表查找信息具有相对较低的基数? 换句话说,是否可以编写一个程序,其中包含内存中 10 个查找表的信息,然后仅处理内存中的 300 行结果集来执行计算?
另外,我会担心可伸缩性——如果您在存储过程中执行此操作,则保证它是一个受单个数据库服务器速度限制的串行过程。 如果您可能有一个客户端程序的多个副本,每个副本处理 2,000 个初始记录集的一部分,那么您可以并行执行一些计算,也许会加快整体处理时间,并使其可扩展以适应以下情况:您的初始记录集大 10 倍。
Well without any specific details of what data you have in these tables, just a back of the napkin calculation shows that you're talking about processing over 6 million rows of information in the example you provided (2,000 rows * 300 rows * (1 row * 10 tables)).
Are all of these rows distinct, or are the 10 tables lookup information that has a relatively low cardinality? In other words, would it be possible to make a program that has the information from the 10 lookup tables in memory, and then just process the 300 row result set in memory to perform the calculations?
Also, I would be concerned about scalability -- if you do this in a stored procedure, it is guaranteed to be a serial process limited by the speed of the single database server. If you have the possibility of multiple copies of a client program, each processing a chunk of the 2,000 initial record set, then you can perform some of the calculations in parallel perhaps speeding up your overall processing time, as well as making it scalable for when your initial record set is 10 times larger.
使用 C# 编写计算代码等编程内容往往更容易且更易于维护。 此外,通常将 SQL Server 上的处理保持在最低限度是一个很好的做法,因为数据库是最难扩展的。
话虽如此,从您的描述来看,存储过程方法似乎是正确的选择。 当计算代码依赖于大量数据时,将数据移出服务器进行计算的成本将会更高。 因此,除非您有合理的方法来优化相关数据(例如缓存查找表?),否则您很可能会发现它比不使用存储过程更痛苦。
Programming things like calculation code tend to be easier and more maintainable in C#. Also, normally keeping processing on the SQL Server to a minimum is a good practice since the database is the hardest to scale.
Having said that, from your description it sounds like the stored procedure approach is the way to go. When calculation code is dependent on large volumes of data, it's going to be more expensive to move the data off server for calculation. So unless you have reasonable ways of optimizing the dependent data (such as caching lookup tables?) then you are most likely going to find it more painful then it's worth to not use a stored proc.
每次都存储过程,但正如 KM 在这些存储过程中所说的那样,将这些迭代保持在最低限度,也就是说在 SQL 中使用联接,关系数据库非常擅长联接。
数据库可扩展性将是一个小问题,特别是因为听起来您将在批处理过程中执行这些计算。
除了最琐碎的 CRUD 应用程序之外,数据库独立性并不真正存在,因此如果您最初的要求是让这一切都与 SQL Server 一起使用,那么可以利用 RDBMS 提供的工具(毕竟您的客户将花费大量资金)在上面)。 如果(这是一个很大的假设)后续客户真的不想使用 SQL Server,那么您将不得不硬着头皮用另一种类型的存储过程进行编码。 但正如您所指出的:“如果我一直通过查询数据库在 .net 中完成所有这些工作,我认为它无法快速完成工作。” 您已将这样做的费用推迟到需要时为止。
Stored procedures every time, but as KM said within those stored procedures keep those iterations to minimum that is to say use joins in your SQL, relational databases are soooooo good at joining.
Database scalibility will be a small issue especially as it sounds like you'd be performing these calcualtions in a batch process.
Database independence doesn't really exist except for the most trivial of CRUD applications so if your initial requirement is to get this all working with SQL Server then leverage the tools that the RDBMS provides (after all your client will have spent a great deal of money on it). If (and it's a big if) a subsequent client really really doesn't want to use SQL Server then you'll have to bite the bullet and code it up in another flavour of stored procedure. But then as you identifed: "if I do all this in .net by query database all the time, I don't think it will be able to finish the work quickly." you've defered the expense of doing it until if and when required.
我会考虑在 SQL Server Integration Services (SSIS) 中执行此操作。 我将计算放入 SSIS,但将查询保留为存储过程。 这将为您提供数据库独立性 - SSIS 可以通过 ODBC 连接处理来自任何数据库的数据 - 以及高性能。 只有简单的 SELECT 语句才会出现在存储过程中,这些是 SQL 标准的部分,在多个数据库产品中最有可能是相同的(假设您坚持使用标准的查询形式)。
I would consider doing this in SQL Server Integration Services (SSIS). I'd put the calculations into SSIS, but leave the queries as stored procedures. This would provide you database independence - SSIS can process data from any database with an ODBC connection - as well as high performance. Only the simple SELECT statements would be in stored procedures, and those are the parts of the SQL standard most likely to be identical across multiple database products (assuming you stick to standard forms of query).