如何优化大内存数据库的分页
我有一个应用程序,其中整个数据库在内存中使用数据库中每个表的 stl-map 来实现。
stl-map 中的每个项目都是一个复杂的对象,引用其他 stl-map 中的其他项目。
该应用程序需要处理大量数据,因此使用超过 500 MB RAM。客户端能够联系应用程序并获取整个数据库的过滤版本。这是通过运行整个数据库并查找与客户相关的项目来完成的。
当应用程序运行了大约一个小时后,Windows 2003 SP2 开始为应用程序调出部分 RAM(尽管计算机上有 16 GB RAM)。
应用程序部分调出后,客户端登录需要很长时间(10 分钟),因为它现在为 stl 映射中的每个指针查找生成页面错误。如果之后立即第二次运行客户端登录,那么速度很快(几秒钟),因为所有内存现在都回到了 RAM 中。
我可以看到可以告诉 Windows 将内存锁定在 RAM 中,但这通常仅建议用于设备驱动程序,并且仅适用于“少量”内存。
我想一个穷人的解决方案可能是循环遍历整个内存数据库,从而告诉 Windows 我们仍然有兴趣将数据模型保留在 RAM 中。
我想另一个穷人的解决方案可能是在 Windows 上完全禁用页面文件。
我猜昂贵的解决方案是 SQL 数据库,然后重写整个应用程序以使用数据库层。那么希望数据库系统能够实现快速访问的方法。
还有其他更优雅的解决方案吗?
I have an application where the entire database is implemented in memory using a stl-map for each table in the database.
Each item in the stl-map is a complex object with references to other items in the other stl-maps.
The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.
When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).
After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.
I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.
I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.
I guess another poor mans solution could be to disable the pagefile completely on Windows.
I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.
Are there other more elegant solutions ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这听起来像是内存泄漏,或者是严重的碎片问题。在我看来,第一步是找出导致 500 Mb 数据占用 16 Gb RAM 并且仍然需要更多内存的原因。
编辑:Windows 有一个工作集修剪器,可以主动尝试调出空闲数据。基本思想是,它遍历页面并将其标记为可用,但将数据保留在其中(并且虚拟内存管理器知道其中有哪些数据)。但是,如果您在将内存分配给其他用途之前尝试访问该内存,它将被标记为再次使用,这通常会阻止其被调出。
如果您确实认为这是问题的根源,您可以通过调用
SetProcessWorkingSetSize
。至少根据我的经验,这很少有多大用处,但你可能会遇到一种非常有帮助的不寻常情况。This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.
Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.
If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling
SetProcessWorkingSetSize
. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.正如@Jerry Coffin 所说,听起来你的实际问题确实是内存泄漏。解决这个问题。
但郑重声明,你的“穷人解决方案”都行不通。完全没有。
Windows 会调出部分数据,因为 RAM 中没有足够的空间。
循环访问整个内存数据库将加载数据模型的每个字节,是的......这将导致其其他部分被调出。最后,您会生成大量页面错误,最终的唯一区别是数据结构的哪些部分被调出。
禁用页面文件?是的,如果您认为严重崩溃比低性能更好。 Windows 不会将数据分页,因为它很有趣。它这样做是为了处理内存不足的情况。如果禁用页面文件,应用程序将在否则分页数据时崩溃。
如果你的数据集真的太大了,内存无法容纳,那么我不明白为什么 SQL 数据库会特别“昂贵”。与您当前的解决方案不同,数据库为此目的进行了优化。它们旨在处理太大而无法放入内存的数据集,并高效地完成此操作。
听起来你有内存泄漏。解决这个问题将是优雅、高效且正确的解决方案。
如果你做不到这一点,那么要么
As @Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.
But for the record, none of your "poor mans solutions" would work. At all.
Windows pages out some of your data because there's not room for it in RAM.
Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.
Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.
If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.
It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.
If you can't do that, then either
我们有一个类似的问题,我们选择的解决方案是将所有内容分配在共享内存块中。 AFAIK,Windows 不会将其分页。然而,在这里使用 stl-map 也不适合胆小的人,而且超出了我们的要求。
我们使用 Boost Shared Memory 来实现此目的我们并且效果很好。密切关注示例,您将很快上手并运行。 Boost 还有 Boost.MultiIndex 这会做很多你想做的事情。
对于免费的 sql 解决方案,您是否查看过 Sqlite?他们可以选择作为内存数据库运行。
祝你好运,听起来是一个有趣的应用程序。
We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.
We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.
For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.
Good luck, sounds like an interesting application.
这就是结束的开始:STL 的 std::map 的内存效率极低。同样适用于 std::list。每个元素将被单独分配,导致相当严重的内存浪费。我经常在可能的应用程序中使用 std::vector + sort() + find() 而不是 std::map (搜索多于修改),并且我事先知道内存使用可能会成为问题。
如果不知道您的应用程序是如何编写的,则很难判断。 Windows 具有从 RAM 卸载任何可以卸载的空闲应用程序内存的功能。但这通常会影响内存映射文件等。
否则,我强烈建议阅读 Windows 内存管理文档。这不太容易理解,但 Windows 拥有可供应用程序使用的各种类型的内存。我从来没有运气过,但可能在您的应用程序中使用自定义 std::allocator 会起作用。
That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.
Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.
Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.
我可以相信这是有缺陷的页面文件行为的错误 - 自 nt4.0 以来,我运行笔记本电脑时大多关闭页面文件。根据我的经验,至少在 XP Pro 之前,Windows 会侵入性地交换页面,只是为了提供对最大工作集空间进行非常缓慢的扩展这一可疑的好处。
问一下,使用 16 GB 可用的实际 RAM 交换到硬盘有什么好处?如果您的工作设置太大以至于需要超过+10 Gigs的虚拟内存,那么一旦实际需要交换,进程将需要更长的时间,甚至数千倍的时间才能完成。在 Windows 上,不可控制的文件系统缓存似乎与这种关系相反。
现在,当我(非常)偶尔用完我的 XP 笔记本电脑上的工作集时,不会出现交通拥堵,有问题的应用程序只会崩溃。在此之前暂停内存占用进程并发出警报的实用程序会很好,但不存在这样的事情,只是违规、崩溃,有时 explorer.exe 也会崩溃。
页面文件 - 谁需要它们'
I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.
Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.
Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.
Pagefiles - who needs em'
---- 编辑
鉴于蛇脚的解释,问题是换出较长时间不使用的内存,因此在需要时内存中没有数据。这与此相同:
可以我告诉Windows不要换出特定进程的内存?
并且VirtualLock函数应该完成它的工作:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- 上一个答案
首先你需要区分内存泄漏和内存需求问题。
如果出现内存泄漏,那么将整个应用程序转换为 SQL 比调试应用程序需要付出更大的努力。
SQL 不可能比设计良好的、特定领域的内存数据库更快,如果有错误,很可能在 SQL 版本中也会有不同的错误。
如果这是内存需求问题,那么您无论如何都需要切换到 SQL,这听起来是个好时机。
---- Edit
Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:
Can I tell Windows not to swap out a particular processes’ memory?
and VirtualLock function should do its job:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- Previous answer
First of all you need to distinguish between memory leak and memory need problems.
If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.
SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.
If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.