多线程时如何减少文件写入次数?
情况是这样的。
在我被分配维护的 Java Web 应用程序中,我被要求改进 QA 期间压力测试的总体响应时间。这个网络应用程序不使用数据库,因为它应该是轻量且简单的。 (我无法更改该决定)
为了保留配置,我发现每次对其进行更改时,包含配置对象列表的通用对象都会序列化到文件中。
使用 Jmeter,我发现在给定的测试用例中,有 2 个请求占用了大部分时间。这两个请求都会添加或更改一些配置对象。由于对文件的访问必须同步,因此当许多用户更改配置时,文件必须在几秒钟内多次完全写入,并且请求正在等待文件写入发生。
我认为所有这些序列化根本没有必要,因为我们一次又一次地重写大部分对象,每个请求中的更改都是针对单个对象的,但文件每次都是作为一个整体写入的。
那么,有没有一种方法可以减少实际文件写入的次数,但仍然保证所有更改最终都被序列化?
任何建议表示赞赏
here's the situation.
In a Java Web App i was assigned to mantain, i've been asked to improve the general response time for the stress tests during QA. This web app doesn't use a database, since it was supposed to be light and simple. (And i can't change that decision)
To persist configuration, i've found that everytime you make a change to it, a general object containing lists of config objects is serialized to a file.
Using Jmeter i've found that in the given test case, there are 2 requests taking up the most of the time. Both these requests add or change some configuration objects. Since the access to the file must be sinchronized, when many users are changing config, the file must be fully written several times in a few seconds, and requests are waiting for the file writing to happen.
I have thought that all these serializations are not necessary at all, since we are rewriting the most of the objects again and again, the changes in every request are to one single object, but the file is written as a whole every time.
So, is there a way to reduce the number of real file writes but still guarantee that all changes are eventually serialized?
Any suggestions appreciated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种选择是在内存中进行更改并在后台保留一个线程,以给定的时间间隔运行并将更改刷新到磁盘。请记住,如果发生崩溃,您将丢失未刷新的数据。
可以使用 ScheduledExecutorService。
IMO,最好使用数据库。你不能使用像 Java DB 这样的嵌入式数据库吗? a href="http://www.h2database.com/html/main.html" rel="nofollow">H2 或 HSQLDB?这些数据库支持并发访问,在崩溃的情况下也能保证数据的一致性。
One option is to do changes in memory and keep one thread on the background, running at given intervals and flushing the changes to the disk. Keep in mind, that in the case of crash you'll lost data that wasn't flushed.
The background thread could be scheduled with a ScheduledExecutorService.
IMO, it would be better idea to use a DB. Can't you use an embedded DB like Java DB, H2 or HSQLDB? These databases support concurrent access and can also guarantee the consistency of data in case of crash.
如果您绝对无法使用数据库,那么显而易见的解决方案是将单个文件分成多个文件,每个配置对象一个文件。它将加速序列化和输出过程,并减少锁争用(更改不同配置对象的请求可能会同时写入其文件,尽管它可能会受到 IO 限制)。
If you absolutely cannot use a database, the obvious solution is to break your single file into multiple files, one file for each of config objects. It would speedup serialization and output process as well as reduce lock contention (requests that change different config objects may write their files simultaneously, though it may become IO-bound).
一种方法是像 Lucene 那样做,实际上根本不覆盖旧文件,而是编写一个仅包含“更新”的新文件。这依赖于您的更新是关联的,但无论如何通常都是这种情况。
这个想法是,如果您的旧文件包含“8”并且您有 3 个更新,则您将“3”写入新文件,新状态为“11”,接下来您写入“-2”,现在您有“9” 。您可以定期汇总旧的和更新的内容。您写入的任何物理文件都不会更新,但一旦不再使用可能会被删除。
为了使这个想法更有意义,请考虑上面的数字是否是某种记录。 “3”可以翻译为“添加三个新记录”,“-2”可以翻译为“删除这两个记录”。
Lucene 是一个非常成功地使用这种附加更新策略的项目示例。
One way is to to do what Lucene does and not actually overwrite the old file at all, but to write a new file that only contains the "updates". This relies on your updates being associative but that is usually the case anyway.
The idea is that if your old file contains "8" and you have 3 updates you write "3" to the new file, and the new state is "11", next you write "-2" and you now have "9". Periodically you can aggregate the old and the updates. Any physical file you write is never updated, but may be deleted once it is no longer used.
To make this idea a bit more relevant consider if the numbers above are records of some kind. "3" could translate to "Add three new records" and "-2" to "Delete these two records".
Lucene is an example of a project that uses this style of additive update strategy very successfully.