删除文件前几个字节的最快方法
我正在使用 Windows Mobile Compact Edition 6.5 手机,并通过蓝牙将二进制数据写入文件。这些文件变得非常大,16M+,我需要做的是一旦文件被写入,我需要在文件中搜索起始字符,然后删除之前的所有内容,从而消除垃圾。当数据传入时,由于图形问题和速度,我无法内联执行此操作,因为我收到大量数据传入,并且传入数据上已经存在太多 if 条件。我认为最好是发布流程。无论如何,这是我的困境,搜索起始字节的速度和文件的重写有时需要 5 分钟或更长时间......我基本上将文件移动到临时文件中,通过它进行解析并重写一个全新的文件。我必须一个字节一个字节地做这件事。
private void closeFiles() {
try {
// Close file stream for raw data.
if (this.fsRaw != null) {
this.fsRaw.Flush();
this.fsRaw.Close();
// Move file, seek the first sync bytes,
// write to fsRaw stream with sync byte and rest of data after it
File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
this.fsRaw = File.Create(this.s_fileNameRaw);
int x = 0;
bool syncFound = false;
// search for sync byte algorithm
while (x != -1) {
... logic to search for sync byte
if (x != -1 && syncFound) {
this.fsPatientRaw.WriteByte((byte)x);
}
}
this.fsRaw.Close();
fsRaw_Copy.Close();
File.Delete(this.s_fileNameRaw + ".old");
}
} catch(IOException e) {
CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
}
}
一定有比这更快的方法!
------------使用答案测试时间 -------------
初始测试我的方式,使用一个字节读取和一个字节写入:
27 Kb/sec
使用下面的答案和 32768字节缓冲区:
321 Kb/sec
使用下面的答案和 65536 字节缓冲区:
501 Kb/sec
I am using a windows mobile compact edition 6.5 phone and am writing out binary data to a file from bluetooth. These files get quite large, 16M+ and what I need to do is to once the file is written then I need to search the file for a start character and then delete everything before, thus eliminating garbage. I cannot do this inline when the data comes in due to graphing issues and speed as I get alot of data coming in and there is already too many if conditions on the incoming data. I figured it was best to post process. Anyway here is my dilemma, speed of search for the start bytes and the rewrite of the file takes sometimes 5mins or more...I basically move the file over to a temp file parse through it and rewrite a whole new file. I have to do this byte by byte.
private void closeFiles() {
try {
// Close file stream for raw data.
if (this.fsRaw != null) {
this.fsRaw.Flush();
this.fsRaw.Close();
// Move file, seek the first sync bytes,
// write to fsRaw stream with sync byte and rest of data after it
File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
this.fsRaw = File.Create(this.s_fileNameRaw);
int x = 0;
bool syncFound = false;
// search for sync byte algorithm
while (x != -1) {
... logic to search for sync byte
if (x != -1 && syncFound) {
this.fsPatientRaw.WriteByte((byte)x);
}
}
this.fsRaw.Close();
fsRaw_Copy.Close();
File.Delete(this.s_fileNameRaw + ".old");
}
} catch(IOException e) {
CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
}
}
There has got to be a faster way than this!
------------Testing times using answer -------------
Initial Test my way with one byte read and and one byte write:
27 Kb/sec
using a answer below and a 32768 byte buffer:
321 Kb/sec
using a answer below and a 65536 byte buffer:
501 Kb/sec
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在对整个文件进行按字节复制。由于多种原因,这种做法效率不高。搜索起始偏移量(如果需要两者,则搜索结束偏移量),然后将两个偏移量(或起始偏移量和文件结尾)之间的整个内容从一个流复制到另一个流。
编辑
您不必阅读整个内容即可进行复制。像这样的东西(未经测试,但你明白了)会起作用。
编辑2
我今天实际上需要类似的东西,所以我决定在不调用 PeekChar() 的情况下编写它。这是我所做的核心 - 请随意将其与上面的第二个
do...while
循环集成。You're doing a byte-wise copy of the entire file. That can't be efficient for a load of reasons. Search for the start offset (and end offset if you need both), then copy from one stream to another the entire contents between the two offsets (or the start offset and end of file).
EDIT
You don't have to read the entire contents to make the copy. Something like this (untested, but you get the idea) would work.
EDIT 2
I actually needed something similar to this today, so I decided to write it without the PeekChar() call. Here's the kernel of what I did - feel free to integrate it with the second
do...while
loop above.不要因为担心速度太慢而忽视某种方法。 尝试一下!只需 5-10 分钟即可尝试,可能会得到更好的解决方案。
如果数据开始的检测过程不是太复杂/太慢,那么避免在开始之前写入数据实际上可能使程序更有效跳过垃圾数据。
如何做到这一点:
if (found)
检查,这实际上不会对您的性能产生任何明显的影响。您可能会发现这本身就解决了问题。但如果您需要更高的性能,您可以对其进行优化:
您可以采取什么措施来最大限度地减少检测数据开始的工作?也许如果您正在寻找一个复杂的序列,您只需要检查启动该序列的一个特定字节值,并且只有当您找到该起始字节时,才需要进行更复杂的检查。有一些非常简单但有效的字符串搜索算法在这种情况下也可能有所帮助。或者您可以分配一个缓冲区(例如 4kB)并逐渐用传入流中的字节填充它。当缓冲区填满时,然后且仅在缓冲区中搜索“垃圾”的末尾。通过对工作进行批处理,您可以利用内存/缓存一致性来使处理比逐字节执行相同工作的处理效率显着提高。
是否需要不断检查所有其他“传入数据的条件”?如何最大限度地减少需要做的工作量,同时仍能达到所需的结果?也许上面的一些想法在这里也可能有帮助?
在跳过垃圾数据时,您实际上需要对数据进行任何处理吗?如果没有,那么您可以将整个过程分为两个阶段(跳过垃圾,复制数据),并且在真正重要的时候跳过垃圾不会给您带来任何损失。
Don't discount an approach because you're afraid it will be too slow. Try it! It'll only take 5-10 minutes to give it a try and may result in a much better solution.
If the detection process for the start of the data is not too complex/slow, then avoiding writing data until you hit the start may actually make the program skip past the junk data more efficiently.
How to do this:
if (found)
check, which really won't make any noticeable difference to your performance.You may find that in itself solves the problem. But you can optimise it if you need more performance:
What can you do to minimise the work you do to detect the start of the data? Perhaps if you are looking for a complex sequence you only need to check for one particular byte value that starts the sequence, and it's only if you find that start byte that you need to do any more complex checking. There are some very simple but efficient string searching algorithms that may help in this sort of case too. Or perhaps you can allocate a buffer (e.g. 4kB) and gradually fill it with bytes from your incoming stream. When the buffer is filled, then and only then search for the end of the "junk" in your buffer. By batching the work you can make use of memory/cache coherence to make the processing considerably more efficient than it would be if you did the same work byte by byte.
Do all the other "conditions on the incoming data" need to be continually checked? How can you minimise the amount of work you need to do but still achieve the required results? Perhaps some of the ideas above might help here too?
Do you actually need to do any processing on the data while you are skipping junk? If not, then you can break the whole thing into two phases (skip junk, copy data), and skipping the junk won't cost you anything when it actually matters.