多个 Actor 写入同一个文件 +旋转
我用 Scala 编写了一个非常简单的网络服务器(基于 Actors)。目的 以便记录来自我们前端服务器的事件(例如,如果用户单击 按钮或加载页面)。该文件需要每 64-100mb 左右旋转一次,并且 它将发送到 s3 以便稍后使用 Hadoop 进行分析。交通量将 大约 50-100 次调用/秒
我脑海中浮现出一些问题:
- 如何确保所有参与者都可以以线程安全的方式写入一个文件?
- 在 X MB 量之后旋转文件的最佳方法是什么?我应该这样做吗 在我的代码中或从文件系统中(如果我从文件系统中执行此操作,那么我如何验证 文件不在写入过程中或缓冲区已刷新)
I have written a very simple webserver in Scala (based on Actors). The purpose
of it so to log events from our frontend server (such as if a user clicks a
button or a page is loaded). The file will need to be rotated every 64-100mb or so and
it will be send to s3 for later analysis with Hadoop. the amount of traffic will
be about 50-100 calls/s
Some questions that pops into my mind:
- How do I make sure that all actors can write to one file in a thread safe way?
- What is the best way to rotate the file after X amount of mb. Should I do this
in my code or from the filesystem (if I do it from the filesystem, how do I then verify
that the file isn't in the middle of a write or that the buffer is flushed)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种简单的方法是使用单个文件写入器参与者来序列化对磁盘的所有写入。然后,您可以拥有多个请求处理程序参与者,在处理来自前端服务器的日志记录事件时为其提供更新。您将在请求处理中获得并发性,同时仍然序列化对日志文件的写入。拥有多个参与者会带来并发写入的可能性,这最多会损坏您的日志文件。基本上,如果您希望某些内容在 Actor 模型中是线程安全的,那么它应该在单个 Actor 上执行。不幸的是,在写入磁盘时,您的任务本质上是串行的。您可以做一些更复杂的事情,例如在轮换时合并来自多个参与者的日志文件,但这似乎有点矫枉过正。除非您在一两秒内生成 64-100MB,否则如果执行 I/O 的额外线程给您带来了任何东西,我会感到惊讶。
假设只有一个写入参与者,计算自上次轮换以来已写入的数量非常简单,而且我认为跟踪参与者的内部状态与轮询文件系统不会产生任何影响。
One simple method would be to have a single file writer actor that serialized all writes to the disk. You could then have multiple request handler actors that fed it updates as they processed logging events from the frontend server. You'd get concurrency in request handling while still serializing writes to your log file. Having more than a single actor would open the possibility of concurrent writes, which would at best corrupt your log file. Basically, if you want something to be thread-safe in the actor model, it should be executed on a single actor. Unfortunately, your task is inherently serial at the point you write to disk. You could do something more involved like merge log files coming from multiple actors at rotation time but that seems like overkill. Unless you're generating that 64-100MB in a second or two, I'd be surprised if the extra threads doing I/O bought you anything.
Assuming a single writing actor, it's pretty trivial to calculate the amount that has been written since the last rotation and I don't think tracking in the actor's internal state versus polling the filesystem would make a difference one way or the other.
你可以使用Only One Actor来编写来自不同线程的每个请求,因为所有请求都经过这个actor,所以不会有并发问题。
根据文件写入滚动,如果您的写入请求可以逐行记录,那么您可以求助于 log4j 或 logback 的 FileRollingAppender 东西。否则,您可以编写自己的文件,只要记住在执行任何删除或更新操作之前锁定文件即可。
滚动通常意味着将旧文件和当前文件重命名为其他名称,然后使用当前文件名创建一个新文件,最后,您始终可以写入当前文件名的文件。
U can use Only One Actor to write every requests from different threads, since all of the requests go through this actor, there will be no concurrency problems.
As per file write rolling, if your write requests can be logged in line by line, then you can resort to log4j or logback's FileRollingAppender things. Otherwise, you can write your own which will be easy as long as remembering to lock the file before performing any delete or update operations.
The rolling usually means you rename the older files and current file to other names and then create a new file with current file name, at last, u can always write to the file with current file name.