我使用 SQL UDF 来封装简单的报告/业务逻辑。我应该避免这种情况吗?
我正在 SQL Server 2008 中为某些报告构建一个新数据库,并且有许多与这些数据相关的常见业务规则进入不同类型的报告中。目前,这些规则大多以传统语言结合在较大的过程程序中,我正在尝试将其转移到 SQL。我正在努力实现根据这些数据实现报告的灵活性,例如一些使用 SAS 的报告,一些使用 C# 的报告等。
我目前的方法是分解这些通用规则(通常非常简单的逻辑)并将它们封装在单独的 SQL UDF 中。性能不是问题,我只是想使用这些规则以某种报告“快照”的形式填充静态字段,然后可以使用您想要的任何方式进行报告。
我喜欢这种模块化方法,因为它可以理解每个规则的作用(并维护规则本身),但我也开始有点担心维护也可能成为一场噩梦。有些规则取决于其他规则,但我无法真正摆脱它 - 这些东西相互依存......这就是我想要的......我想? ;)
数据库中的这种模块化方法是否有更好的方法?我是否走在正确的轨道上,或者我是否以太多的应用程序开发心态来思考这个问题?
I'm building up a new database in SQL Server 2008 for some reporting, and there are many common business rules pertaining to this data that go into different types of reports. Currently these rules are mostly combined in larger procedural programs, in a legacy language, which I'm trying to move over to SQL. I'm shooting for flexibility in implementing reporting from this data, like some reporting in SAS, some in C#, etc.
My approach currently is to break up these common rules (usually VERY simple logic) and encapsulate them in individual SQL UDFs. Performance is not a concern, I just want to use these rules to populate static fields in a sort of reporting "snapshot", which can then be used to report from in whatever way you want.
I like this modular approach as far as understanding what each rule is doing (and maintaining the rules themselves), but I'm also starting to become a bit afraid that the maintenance may also become a nightmare. Some rules depend on others, but I can't really get away from that - these things build off each other...which is what I want...I think? ;)
Are there some better approaches for this modular approach in a database? Am I on the right track, or am I thinking of this in too much of a application-development mindset?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在某些时候,广泛使用 UDF 将开始导致性能问题,因为它们是针对结果集中的每一行执行的,并且模糊了优化器的逻辑,从而很难使用索引(即,我不太明白性能如何无法提高)一个问题,但你最了解你的要求)。对于某些功能来说,它们非常棒;但要谨慎使用它们。
At some point, extensive use of UDFs will start to cause performance problems as they are executed for each row in your resultset and obscure logic from the optimizer, making it hard to use indexes (i.e. I don't really understand how performance can not be an issue, but you know your requirements best). For certain functionality they are great; but use them sparingly.
将逻辑保留在数据库端几乎总是正确的做法。
正如您在问题中提到的,大多数业务规则涉及非常简单的逻辑,但通常处理大量数据。
数据库引擎是实现该逻辑的正确选择,因为首先,它将数据 I/O 保持在最低限度,其次,数据库更有效地执行大多数数据转换。
前段时间我就这个主题写了一篇非常主观的博客文章:
附注:
UDF
与存储过程不同。UDF 是一个由查询内部可调用设计的函数,因此它只能执行非常有限的可能操作子集。
您可以通过存储过程做更多的事情。
更新:
在您给出的示例中,就像更改计算“派生字段”的逻辑一样,计算该字段的
UDF
就可以了。但是(以防万一)当性能成为一个问题时(相信我,这会比人们想象的要快得多),使用基于集合的操作转换数据可能比使用 UDF 更有效s。
在这种情况下,您可能希望创建一个视图、存储过程或返回结果集的表值函数,其中将包含更有效的查询,而不是限制自己更新 UDF(它们是记录) -基于)。
一个例子:您的查询有类似“用户分数”的内容,您认为它可能会发生变化,并将其包装到
UDF
最初,这只是表中的一个普通字段:
,然后您决定它使用其他表中的数据来计算它:
这将导致引擎在任何情况下都使用效率最低的
NESTED LOOPS
算法。但是,如果您创建了一个视图并重写了如下所示的底层查询:
,这将为引擎提供更广泛的优化空间,同时仍然保留结果集结构并将逻辑与表示分离。
Keeping logic on database side is almost always a right thing to do.
As you mentioned in your question, most business rules involve quite simple logic but it usually deals with huge volumes of data.
The database engine is the right thing to implement that logic because, first, it keeps data
I/O
to a minimum, and, second, database performs mosts data transformations much more efficiently.Some time ago I wrote a very subjective blog post on this topic:
One side note: a
UDF
is not the same as a stored procedure.A
UDF
is a function designed by callable inside a query, so it can do only a very limited subset of possible operations.You can do much more is a stored procedure.
Update:
In the example you gave, like changing logic that calculates a "derived field", the
UDF
that calculates the field is OK.But (just in case) when performance will be an issue (and believe me, this will be much sooner that one may think), transforming data with set-based operations may be much more efficient than using
UDF
s.In this case, you may want to create a view, a stored procedure or a table valued function returning a resultset which will contain a more efficient query rather that limiting yourself to updating the
UDF
s (which are record-based).One example: your query has something like "user score" which you feel to be subject to change and wrap it into a
UDF
Initially, this is just a plain field in the table:
, then you decide it to calculate it using data from other table:
This will condemn the engine to using the least efficient
NESTED LOOPS
algorithm in either case.But if you created a view and rewritten the underlying queries like this:
, this would give the engine much wider space for optimization while still keeping the resultset structure and separating logic from presentation.
SQL 是基于集合的,在应用模块化方法时本质上性能很差。
函数、存储过程和/或视图——它们都抽象了底层逻辑。当您使用两个(或更多)使用相同表的函数/等时,性能问题就会出现。这意味着当可以使用同一个表时,却对同一个表进行了两个查询。
多个函数的使用对我来说表明数据模型非常“灵活”。对我来说,这意味着有问题的数据类型和整体列/表定义。需要函数等,因为数据库允许存储任何内容,这意味着坏数据的可能性非常高。我宁愿投入精力始终拥有良好/有效的数据,而不是在事后努力对抗现有的不良数据。
数据库就是包含这个逻辑的地方。它比应用程序代码更快,最重要的是 - 集中化以最大限度地减少维护。
SQL is set based, and inherently performs poorly when applying a modular approach.
Functions, Stored Procedures and/or Views - they all abstract the underlying logic. The performance problem comes into play when you use two (or more) functions/etc that utilize the same table(s). It means that two queries are made the the same table(s) when one could've been used.
The use of multiple functions says to me that the data model was made to be very "flexible". To me, that means questionable data typing and overall column/table definition. There's a need for functions/etc because the database will allow anything to be stored, which means the possibility of bad data is very high. I'd rather put the effort into always having good/valid data, rather than working after the fact to combat existing bad data.
The database is the place to contain this logic. It is faster than application code, and most importantly - centralized to minimize maintainence.
我想说,您走在正确的轨道上 - 随着 SQL 过程变得越来越复杂,它可能会迅速失控,将共享的、重复的逻辑片段封装到 UDF 中是解决此问题的完全合适的解决方案。
我经常将仅在该过程中使用的 sql 过程中的逻辑封装到命名良好的 UDF 中,以提高可读性。
看看关于 UDF 的 这篇 MSDN 文章 - 也许它会给出你对它们的用途还有更多的想法吗?
如果您打算大量使用 UDF,则需要注意各种性能注意事项 - 例如标量 UDF 与表 UDF 的性能以及 CLR UDF 的可能优势。
I'd say that you are on the right track - sql procedures can rapidly get out of hand as the become more and more complex and encapsulating shared, repeated pieces of logic into UDFs is an entirely appropriate solution to address this.
I often go as far as encapsulating logic from a sql procedure that is only used in that one procedure into a well named UDF to improve readibility.
Have a look at this MSDN article on UDFs - perhaps it will give you some more ideas about their uses?
There are various performance considerations that you will need to be aware of if you intend to use UDFs heavily - things like the performance of scalar vs table UDFs and the possible benefits of CLR UDFs.
如果您有兴趣构建用于报告的数据仓库,您将尝试将尽可能多的数据放入 ETL 的转换部分中,以便您的报告 SQL 由工具和用户等能够生成的简单语句组成。
SSIS 是 SQL Server 附带的非常强大的 ETL 工具,可以完成此类任务。
If your interesting in building a data warehouse for reporting you would try to put as much of this into the Transform part of your ETL as possible so your reporting SQL is comprised of simple statements that tools and users alike are capable of generating.
SSIS is very capable ETL tool that comes with SQL server for this sort of thing.