C# file-io windows-mobile-6 windows-mobile-6.5

删除文件前几个字节的最快方法

发布于 2024-12-04 00:34:16 字数 1615 浏览 1 评论 0原文

我正在使用 Windows Mobile Compact Edition 6.5 手机，并通过蓝牙将二进制数据写入文件。这些文件变得非常大，16M+，我需要做的是一旦文件被写入，我需要在文件中搜索起始字符，然后删除之前的所有内容，从而消除垃圾。当数据传入时，由于图形问题和速度，我无法内联执行此操作，因为我收到大量数据传入，并且传入数据上已经存在太多 if 条件。我认为最好是发布流程。无论如何，这是我的困境，搜索起始字节的速度和文件的重写有时需要 5 分钟或更长时间......我基本上将文件移动到临时文件中，通过它进行解析并重写一个全新的文件。我必须一个字节一个字节地做这件事。

private void closeFiles() {
    try {

    // Close file stream for raw data.
    if (this.fsRaw != null) {
        this.fsRaw.Flush();
        this.fsRaw.Close();

        // Move file, seek the first sync bytes, 
        // write to fsRaw stream with sync byte and rest of data after it
        File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
        FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
        this.fsRaw = File.Create(this.s_fileNameRaw);

        int x = 0;
        bool syncFound = false;

        // search for sync byte algorithm
        while (x != -1) {
            ... logic to search for sync byte
            if (x != -1 && syncFound) {
                this.fsPatientRaw.WriteByte((byte)x);
            }
        }

        this.fsRaw.Close();

        fsRaw_Copy.Close();
        File.Delete(this.s_fileNameRaw + ".old");
    }


    } catch(IOException e) {
        CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
    }
}

一定有比这更快的方法！

------------使用答案测试时间 -------------

初始测试我的方式，使用一个字节读取和一个字节写入：

27 Kb/sec

使用下面的答案和 32768字节缓冲区：

321 Kb/sec

使用下面的答案和 65536 字节缓冲区：

501 Kb/sec

原文

I am using a windows mobile compact edition 6.5 phone and am writing out binary data to a file from bluetooth. These files get quite large, 16M+ and what I need to do is to once the file is written then I need to search the file for a start character and then delete everything before, thus eliminating garbage. I cannot do this inline when the data comes in due to graphing issues and speed as I get alot of data coming in and there is already too many if conditions on the incoming data. I figured it was best to post process. Anyway here is my dilemma, speed of search for the start bytes and the rewrite of the file takes sometimes 5mins or more...I basically move the file over to a temp file parse through it and rewrite a whole new file. I have to do this byte by byte.

private void closeFiles() {
    try {

    // Close file stream for raw data.
    if (this.fsRaw != null) {
        this.fsRaw.Flush();
        this.fsRaw.Close();

        // Move file, seek the first sync bytes, 
        // write to fsRaw stream with sync byte and rest of data after it
        File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
        FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
        this.fsRaw = File.Create(this.s_fileNameRaw);

        int x = 0;
        bool syncFound = false;

        // search for sync byte algorithm
        while (x != -1) {
            ... logic to search for sync byte
            if (x != -1 && syncFound) {
                this.fsPatientRaw.WriteByte((byte)x);
            }
        }

        this.fsRaw.Close();

        fsRaw_Copy.Close();
        File.Delete(this.s_fileNameRaw + ".old");
    }


    } catch(IOException e) {
        CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
    }
}

There has got to be a faster way than this!

------------Testing times using answer -------------

Initial Test my way with one byte read and and one byte write:

27 Kb/sec

using a answer below and a 32768 byte buffer:

321 Kb/sec

using a answer below and a 65536 byte buffer:

501 Kb/sec

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

在风中等你 2024-12-11 00:34:16

您正在对整个文件进行按字节复制。由于多种原因，这种做法效率不高。搜索起始偏移量（如果需要两者，则搜索结束偏移量），然后将两个偏移量（或起始偏移量和文件结尾）之间的整个内容从一个流复制到另一个流。

编辑

您不必阅读整个内容即可进行复制。像这样的东西（未经测试，但你明白了）会起作用。

private void CopyPartial(string sourceName, byte syncByte, string destName)
{
    using (var input = File.OpenRead(sourceName))
    using (var reader = new BinaryReader(input))
    using (var output = File.Create(destName))
    {
        var start = 0;
        // seek to sync byte
        while (reader.ReadByte() != syncByte)
        {
            start++;
        }

        var buffer = new byte[4096]; // 4k page - adjust as you see fit

        do
        {
            var actual = reader.Read(buffer, 0, buffer.Length);
            output.Write(buffer, 0, actual);
        } while (reader.PeekChar() >= 0);
    }

}

编辑2

我今天实际上需要类似的东西，所以我决定在不调用 PeekChar() 的情况下编写它。这是我所做的核心 - 请随意将其与上面的第二个 do...while 循环集成。

            var buffer = new byte[1024];
            var total = 0;

            do
            {
                var actual = reader.Read(buffer, 0, buffer.Length);
                writer.Write(buffer, 0, actual);
                total += actual;
            } while (total < reader.BaseStream.Length);

You're doing a byte-wise copy of the entire file. That can't be efficient for a load of reasons. Search for the start offset (and end offset if you need both), then copy from one stream to another the entire contents between the two offsets (or the start offset and end of file).

EDIT

You don't have to read the entire contents to make the copy. Something like this (untested, but you get the idea) would work.

private void CopyPartial(string sourceName, byte syncByte, string destName)
{
    using (var input = File.OpenRead(sourceName))
    using (var reader = new BinaryReader(input))
    using (var output = File.Create(destName))
    {
        var start = 0;
        // seek to sync byte
        while (reader.ReadByte() != syncByte)
        {
            start++;
        }

        var buffer = new byte[4096]; // 4k page - adjust as you see fit

        do
        {
            var actual = reader.Read(buffer, 0, buffer.Length);
            output.Write(buffer, 0, actual);
        } while (reader.PeekChar() >= 0);
    }

}

EDIT 2

I actually needed something similar to this today, so I decided to write it without the PeekChar() call. Here's the kernel of what I did - feel free to integrate it with the second do...while loop above.

            var buffer = new byte[1024];
            var total = 0;

            do
            {
                var actual = reader.Read(buffer, 0, buffer.Length);
                writer.Write(buffer, 0, actual);
                total += actual;
            } while (total < reader.BaseStream.Length);

回复收藏 0 原文

初熏 2024-12-11 00:34:16

不要因为担心速度太慢而忽视某种方法。 尝试一下！只需 5-10 分钟即可尝试，可能会得到更好的解决方案。

如果数据开始的检测过程不是太复杂/太慢，那么避免在开始之前写入数据实际上可能使程序更有效跳过垃圾数据。

如何做到这一点：

使用一个简单的布尔值来了解是否已检测到数据的开始。如果您正在读取垃圾，那么不要浪费时间将其写入输出，只需扫描它以检测数据的开始即可。一旦找到开始，就停止扫描开始并将数据复制到输出。仅复制好的数据只会导致 if (found) 检查，这实际上不会对您的性能产生任何明显的影响。

您可能会发现这本身就解决了问题。但如果您需要更高的性能，您可以对其进行优化：

您可以采取什么措施来最大限度地减少检测数据开始的工作？也许如果您正在寻找一个复杂的序列，您只需要检查启动该序列的一个特定字节值，并且只有当您找到该起始字节时，才需要进行更复杂的检查。有一些非常简单但有效的字符串搜索算法在这种情况下也可能有所帮助。或者您可以分配一个缓冲区（例如 4kB）并逐渐用传入流中的字节填充它。当缓冲区填满时，然后且仅在缓冲区中搜索“垃圾”的末尾。通过对工作进行批处理，您可以利用内存/缓存一致性来使处理比逐字节执行相同工作的处理效率显着提高。
是否需要不断检查所有其他“传入数据的条件”？如何最大限度地减少需要做的工作量，同时仍能达到所需的结果？也许上面的一些想法在这里也可能有帮助？
在跳过垃圾数据时，您实际上需要对数据进行任何处理吗？如果没有，那么您可以将整个过程分为两个阶段（跳过垃圾，复制数据），并且在真正重要的时候跳过垃圾不会给您带来任何损失。

Don't discount an approach because you're afraid it will be too slow. Try it! It'll only take 5-10 minutes to give it a try and may result in a much better solution.

If the detection process for the start of the data is not too complex/slow, then avoiding writing data until you hit the start may actually make the program skip past the junk data more efficiently.

How to do this:

Use a simple bool to know whether or not you have detected the start of the data. If you are reading junk, then don't waste time writing it to the output, just scan it to detect the start of the data. Once you find the start, then stop scanning for the start and just copy the data to the output. Just copying the good data will incur no more than an if (found) check, which really won't make any noticeable difference to your performance.

You may find that in itself solves the problem. But you can optimise it if you need more performance:

What can you do to minimise the work you do to detect the start of the data? Perhaps if you are looking for a complex sequence you only need to check for one particular byte value that starts the sequence, and it's only if you find that start byte that you need to do any more complex checking. There are some very simple but efficient string searching algorithms that may help in this sort of case too. Or perhaps you can allocate a buffer (e.g. 4kB) and gradually fill it with bytes from your incoming stream. When the buffer is filled, then and only then search for the end of the "junk" in your buffer. By batching the work you can make use of memory/cache coherence to make the processing considerably more efficient than it would be if you did the same work byte by byte.
Do all the other "conditions on the incoming data" need to be continually checked? How can you minimise the amount of work you need to do but still achieve the required results? Perhaps some of the ideas above might help here too?
Do you actually need to do any processing on the data while you are skipping junk? If not, then you can break the whole thing into two phases (skip junk, copy data), and skipping the junk won't cost you anything when it actually matters.

回复收藏 0 原文

~没有更多了~