如何有效地存储和从缓存中读回层次结构
我的情况是,我目前正在 SQL 数据库中存储一个层次结构,该数据库很快接近 15000 个节点(5000 个边)。该层次结构根据用户在树中的位置定义我的安全模型,授予对下面项目的访问权限。因此,当用户请求所有受保护项目的列表时,我使用 CTE 在数据库中递归它(并展平所有项目),这开始显示其年龄(缓慢)。
层次结构并不经常更改,因此我尝试将其移至 RAM(redis)中。请记住,我有许多子系统需要它来进行安全调用,并且需要 UI 来构建用于 CRUD 操作的树。
第一次尝试
我的第一次尝试是将关系存储为键值对 (这就是它在数据库中的存储方式)
E / \ F G / \ / \ H I J K mapped to: E - [F, G] F - [H, I] G - [J, K]
因此,当我想要 E 及其所有后代时,我使用键递归地获取它的子节点及其子节点,并且它允许我从任何节点开始向下移动。该解决方案提供了良好的速度提升,但对于 15,000 个节点,在代码中重建我的树大约需要 5000 次缓存命中(最糟糕的情况......从 E 开始。性能基于起始节点位置,导致超级用户看到最差表现)。这仍然相当快,但似乎很闲聊。我喜欢这样一个事实:我可以随时将节点从键列表中弹出来删除它,而无需重建整个缓存。在 UI 上按需可视化地构建一棵树也非常快。
第二次尝试
我的另一个想法是从数据库中获取层次结构,构建树并将其存储在 RAM(redis)中,然后将整个内容从内存中取出(大约 2 MB)尺寸,序列化)。这让我对 redis 进行了一次调用(不那么啰嗦)来拉出整个树,找到用户的父节点,然后向下获取所有子项。这些调用很频繁,并且在网络层向下传递 2 MB 显得很大。这也意味着我无法轻松添加/删除项目,而不需要拉下树并编辑并将其全部推回。此外,通过 HTTP 构建的按需树意味着每个请求必须拉低 2MB 才能获得直接子级(使用第一个解决方案时非常小)。
那么您认为哪种解决方案是更好的方法(从长远来看,随着它的不断增长)。两者都无疑更快并且减轻了数据库的一些负载。或者他们是我没有想到的更好的方法来实现这一目标?
谢谢
My situation is that I'm currently storing a hierarchy in a SQL database thats quickly approaching 15000 nodes ( 5000 edges ). This hierarchy is defining my security model based off a users position in the tree, granting access to items below. So when a user requests a list of all secured items, I'm using CTE to recurse it in the db ( and flatten all items ), which is started to show its age ( slow ).
The hierarchy is not changing often so I've attempted to move it into RAM ( redis ). Keeping in mind i have many subsystems that need this for security calls, and UI's to build the tree for CRUD operations.
First Attempt
My first attempt is to store the relationships as a key value pair
(this is how its stored in the database )
E / \ F G / \ / \ H I J K mapped to: E - [F, G] F - [H, I] G - [J, K]
So when i want E and all its decedents, i recursively get its child and their child using the keys, and it allows me to start at any node to move down. This solution gave a good speed increase but with 15,000 nodes, it was approximately 5000 cache hits to rebuild my tree in code ( Worse case scenario... starting at E. performance is based off the starting nodes location, resulting in super users seeing the worst performance). This was still pretty fast but seemed to chatty. I like the fact that i can remove a node at anytime by popping it out of the keys List without rebuilding my entire cache. This was also lighting fast to build a tree on demand visually on a UI.
Second Attempt
My other Idea is to to take the Hierarchy from the Database, build the tree and store that in RAM ( redis ) then pull the entire thing out of memory ( it was approx 2 MB in size, serialized ). This gave me a single call ( not as chatty ) into redis to pull the entire tree out, locate the users parent node, and descend to get all child items. These calls are frequent and passing down 2 MB at the network layer seemed large. This also means i cannot easily add/remove and item without pulling down the tree and editing and pushing it all back. Also on demand trees building via HTTP meant each request had to pull down 2MB to only get direct children ( very small using the first solution ).
So which solution do you think is a better approach ( long term as it continues to grow ). Both are defiantly faster and take some load off the database. Or is their a better way to accomplish this that i have not thought about?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
让我提供一个想法...
使用分层版本控制。当图中的节点被修改时,会增加其版本(数据库中的一个简单的 int 字段),但也会增加其所有祖先的版本。
最终结果是,通常情况下,您会很早就剔除提取,通常只在一个节点之后,您甚至不需要缓存整个图。修改费用昂贵,但这应该不是问题,因为它们很少见。
顺便说一句,类似的原理会在相反的方向发挥作用 - 即当您从叶子开始并需要找到到根的路径时。您需要以相反的方向更新版本控制层次结构,但其余部分应该以非常相似的方式工作。您甚至可以组合使用两个方向。
--- 编辑 ---
如果您的数据库和 ADO.NET 驱动程序支持它,则可能值得研究服务器通知,例如 MS SQL Server 的 SqlDependency 或 OracleDependency。
本质上,您指示 DBMS 监视更改并在更改发生时通知您。这非常适合以有效的方式使客户端缓存保持最新状态。
Let me offer an idea...
Use hierarchical versioning. When a node in the graph is modified, increment its version (a simple int field in the database), but also increment versions of all of its ancestors.
The net result is that more often than not, you'll cull the fetching very early, usually after only one node, and you won't even need to cache the whole graph. Modifications are expensive, but this shouldn't be a problem since they are rare.
BTW, a similar principle would work in the opposite direction - i.e. when you start with a leaf and need to find the path to the root. You'd need to update the versioning hierarchy in the opposite direction, but the rest should work in a very similar manner. You could even have both directions in combination.
--- EDIT ---
If your database and ADO.NET driver support it, it might be worth looking into server notifications, such as MS SQL Server's SqlDependency or OracleDependency.
Essentially, you instruct the DBMS to monitor changes and notify you when they happen. This is ideal for keeping your client-side cache up-to-date in an efficient way.
如果层次结构不经常更改,您可以计算每个节点下面的整个项目列表(而不仅仅是直接子节点)。
这样,您将需要更多的 RAM,但对于任何用户来说,它的工作速度都快如闪电,因为您将能够在单次读取中读取后代节点的整个列表。
对于您的示例(我将使用 JSON 格式):
好吧,对于超级用户,您仍然需要为每个请求传输大量数据,但我没有看到任何方法可以减少数据量。
If hierarchy is not changed often, you can calculate whole list of items below for each node (instead of just direct children).
This way you will need significantly more RAM, but it will work lightning-fast for any user, because you will be able to read whole list of descendant nodes in single read.
For your example (I'll use JSON format):
Well, for superusers you will still need to transfer alot of data per request, but I don't see any way to make it lesser.
我们做这样的事情。我们将树读入内存,将其存储在应用程序缓存中,然后从内存中访问它。由于我们几乎从不进行更改,并且更改不必立即反映在 Web 应用程序中,因此我们甚至不必费心去检测它们,只需让缓存老化并刷新即可。它对我们来说非常有效。
We do something like this. We read the tree into memory, store it in the application cache, and access it from memory. Since our changes almost never, and changes don't have to be immediately reflected in the web app, we don't even bother to detect them, just let the cache age and get refreshed. It works really well for us.