SQL Server 2000 中的缓存函数结果
我想记住函数结果以提高性能,即延迟填充在函数参数上索引的缓存。 我第一次调用函数时,缓存不会有任何输入参数,因此它将计算它并在返回之前存储它。 后续调用只使用缓存。
然而,SQL Server 2000 似乎对函数“确定性”有一个愚蠢的任意规则。 禁止插入、更新和常规存储过程调用。 但是,允许扩展存储过程。 这是如何确定性的? 如果另一个会话修改数据库状态,函数输出无论如何都会改变。
我快气疯了 我原以为我可以使缓存对用户透明。 这可能吗? 我没有部署扩展存储过程的权限。
编辑:
这个限制在 2008 年仍然存在。看在上帝的份上,你不能打电话给 RAND!
缓存将由我在数据库中实现。 缓存是用于缓存的任何数据存储...
编辑:
除了底层数据的更改之外,函数的相同参数不会产生不同的结果。 这是一个 BI 平台,唯一的更改来自计划的 ETL,此时我将 TRUNCATE 缓存表。
这些是 I/O 密集型时间序列计算,数量级为 O(n^4)。 我无权更改基础表或索引。 此外,许多这些函数都使用相同的中间函数,并且缓存允许使用这些函数。
UDF 并不是真正确定性的,除非它们考虑了数据库状态的变化。 重点是什么? SQL Server有缓存吗? (具有讽刺意味。)如果 SQL Server 正在缓存,那么它一定会在对架构绑定的表进行更改时过期。 如果它们是架构绑定的,那么为什么不绑定函数修改的表呢? 我明白为什么不允许使用 procs,尽管这太草率了; 只是模式绑定过程。 而且,顺便说一句,为什么允许扩展存储过程? 你不可能跟踪那些为确保确定性所做的事情! 啊啊!!!
编辑:
我的问题是:有什么方法可以以可在视图中使用的方式延迟缓存函数结果吗?
I want to memoize function results for performance, i.e. lazily populate a cache indexed on the function arguments. The first time I call a function, the cache won't have anything for the input arguments, so it will calculate it and store it before returning it. Subsequent calls just use the cache.
However, it seems that SQL Server 2000 has a stupid arbitrary rule about functions being "deterministic". INSERTs, UPDATEs, and regular stored procedure calls are forbidden. However, extended stored procedures are allowed. How is this deterministic? If another session modifies the database state, the function output will change anyways.
I'm steaming mad. I had thought I could make caching transparent to the user. Is this possible? I don't have the permissions to deploy extended stored procedures.
EDIT:
This limitation is still in 2008. You can't call RAND, for God's sake!
The cache would be implemented by me in the DB. A cache is any data store used for caching...
EDIT:
There are no cases where the same arguments to a function will yield different results, outside of changes to the underlying data. This is a BI platform, and the only changes come from scheduled ETL, at which time I would TRUNCATE the cache table.
These are I/O intensive time series calculations, on the order of O(n^4). I don't have the mandate to change the underlying table or indexes. Also, a lot of these functions use the same intermediate functions, and caching allows those to be used.
UDFs are not truly deterministic, unless they account for changes in database state. What's the point? Is SQL Server caching? (Ironic.) If SQL Server is caching, then it must be expiring on changes to tables that are schema bound. If they're schema bound, then why not bind tables that the function modifies? I can see why procs aren't allowed, although that's just sloppy; just schema bind procs. And, BTW, why allow extended stored procs? You can't possibly track what those do to ensure determinism!!! Argh!!!
EDIT:
My question is: Is there any way to lazily cache function results in a way that can be used in a view?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
确定性意味着相同的输入返回相同的输出,与时间和数据库无关。
SQL Server(任何版本)不会缓存 UDF - 我相信它将避免在一行上调用 UDF 两次,但仅此而已。
我使用的一个技巧是(我想我将其发布在 SO 上):
如果可以的话,重构 UDF,以便为给定的输入集返回有效的可用离散值子集。 对于数值计算,有时可以重构逻辑以返回一个因子或比率,该因子或比率在 UDF 外部相乘,而不是在 UDF 内部与传入的值相乘。
通过 DISTINCT 行集调用 UDF 并将结果缓存到临时表中。 如果您仅在 17,000,000 行集上使用 100,000 个参数元组调用 UDF,则效率非常高得多。
JOIN 到临时表(基本上从基于代码的逻辑转换为基于表的逻辑)以获取值。
该表可以根据需要重复使用甚至保留。
可以通过首先 LEFT JOINing 查找丢失的缓存条目来完成对表的添加。
这适用于单行表值 UDF 和标量 UDF。 我主要将它用于表值 UDF。 SQL Server 2005 有一个修补程序,应该可以解决 UDF 性能问题 - 我正在等待 DBA 在部署到生产环境之前对其进行测试。
Deterministic means that the same inputs return the same output independent of time and database.
SQL Server (any version) does no caching of UDFs - I believe it will avoid calling the UDF twice on a single row, but that's it.
One trick I've used is to (I think I posted it here on SO):
Refactor the UDF if you can so that there are effectively a usable discrete subset of values returned for a given set of inputs. For numerical calculations, one can sometimes refactor the logic to return a factor or rate which is multiplied outside the UDF instead of multiplied inside the UDF from a passed in value.
Call the UDF over the DISTINCT rowset and cache the results to a temporary table. If you are only calling the UDF with 100,000 tuples of parameters over a 17,000,000 row set, this is very much more efficient.
JOIN to the temporary table (basically converting from code-based logic to table-based logic) to get values.
This table can be re-used as necessary or even kept.
Addition to the table can be done by first LEFT JOINing to find missing cached entries.
This works for both single-row table-valued UDFs and scalar UDFs. I'm mainly using it for table-valued UDFs. There is a hotfix to SQL Server 2005 which is supposed to address the UDF performance - I'm waiting on mthe DBAs to test it before deploying to production.