检查文件末尾的固定长度标记
我有一个应用程序,我的任务是在之后进行清理。应用程序本身相对简单 - 它运行 SQL 查询、使用 Web 服务并将结果发送到日志文件。我的工作是在应用程序完成后将文件存档到我们的 NAS。它会以独占方式锁定文件,直到文件处理完毕为止,因此增加了一点复杂性。我也不允许接触应用程序,只能接触日志。无论如何,我的应用程序相当简单:
- 检查文件是否可以打开(捕获 IOException),如果没有抛出异常,则在 bool[] 中将其标记为可访问。
- 遍历标记为 true 的文件数组,使用 ReadLine 方法将文件的每一行读取到 StreamReader 中。因为应用程序偶尔会出现问题并且无法完成,所以我不能简单地使用 IOException 来判断文件是否已完成 - 我必须实际解析文本。
- 如果找到指示完成的文本,请压缩文件,将存档文件加载到 NAS 上,然后删除原始文件。
我的代码可以工作,只是非常耗时(每个日志文件大约 500 MB)。我对改进的想法包括从文件底部而不是顶部开始搜索,但 StreamReader 不支持这种方法。我无法使用 ReadToEnd 方法然后反向读取,因为这只会引发内存不足异常。关于加快日志文件解析速度的方法有什么想法吗?
I have an application that I've been tasked with cleaning up after. The application itself is relatively simple - it runs a SQL query, consumes a web service, and spews the results to a log file. My job is to archive the files to our NAS after the application is done with them. It locks the files exclusively until it's done with them so it adds a small bit of complexity. I'm also not allowed to touch the application, just the logs. Anyway my application is fairly simple:
- Check if the file can be opened (catch IOException) and mark it off as accessible in a bool[] if no exception is thrown.
- Going through the array of files marked true, read each line of the file into a StreamReader using the ReadLine method. Because the application occasionally hiccups and doesn't finish, I can't simply use the IOException to tell if the file is completed - I have to actually parse the text.
- If the text indicating completion is found, zip the file, load the archived file onto the NAS, and delete the original.
My code works, it's just very time consuming (the log files are each around 500 MB). My thoughts on improvement involve starting my search from the bottom of the file instead of from the top, but the StreamReader doesn't support such a method. I can't use the ReadToEnd method and then reverse read because that just throws an out of memory exception. Any thoughts on a way I could speed up the parsing of the log file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我假设您在文件末尾查找单个标记来确定它是否完成?如果是这样,我还假设标记的长度已知,例如单个字节或 3 个字节的序列等。
如果上述假设正确,您可以打开 FileStream,查找到文件末尾减去预期标记长度读取字节,如果标记是现在并完成,您知道您可以处理该文件。
可以使用如下代码来查找末尾 -3 个字节
I assume you look for a single marker at the end of the file to determine if it is finished? If so I also assume the marker is of a known length, for example a single byte or a sequence of 3 bytes etc.
If the above assumptions are correct, you can open the FileStream, Seek to the end of the file minus the expected marker length read the bytes and if the marker is present and complete you know you can process the file.
Seeking to the end -3 bytes can be done with code like the following