有效地从文件中去除一些 ASCII 代码?

发布于 2024-12-25 19:46:34 字数 356 浏览 0 评论 0原文

我有一个 100mb 的磁盘文件(最大可达 300mb)。有空值和其他一些不应该出现在其中的控制字符。首先,我将字符串读入内存,然后逐个字符地重新读取它,然后删除有问题的字符并将干净的内容放入 StringBuilder 中,然后对其执行 ToString 。

当然,这会占用太多内存。我需要弄清楚如何删除磁盘上错误的 ASCII 值。也许(.NET 4)内存映射文件流是正确的(我从 内存映射文件读取文件结尾? 不久前)?

所有想法均表示赞赏。谢谢。

I have an on-disk file of 100mb (can be up to 300mb). There are nulls and some other control characters that should not be in there. At first I read the string into memory and then re-read it Char by Char and then removed the offending Chars and put the clean stuff into a StringBuilder and then did a ToString on that.

That uses too much memory of course. I need to figure out how to strip out bad ASCII values on disk. Maybe (.NET 4) Memory Mapped File Stream is the right thing (I checked this out from Memory Mapped File to Read End of File? a while ago)?

All ideas appreciated. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

笑着哭最痛 2025-01-01 19:46:34

如果您需要缩小文件以删除坏字符,则只需一次读取一个字符或一个块中的文件,然后将其写入一个新文件,跳过坏字符。
这也给你一个撤销!

如果您可以就地替换坏字符,以便文件的长度不会改变,则映射文件并扫描内存,用例如空格(ascii 32)替换每个坏字符。这是最简单的,而且可能更快 - 但无论哪种方式,你都将受到原始磁盘 I/O 的支配

If you need to shrink the file to remove bad characters then simply read the file in a character or block at a time and write it out to a new file skipping bad characters.
This also gives you an undo!

If you can replace bad characters in place so that the length of the file doesn't change then map the file and scan over the memory replacing each bad character with eg space (ascii 32). This is simplest and probably faster - but either way you are going to be dominated by the raw disk i/o

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文