如何在不遍历内容的情况下查找文件中的字符数

发布于 2025-01-02 14:38:47 字数 106 浏览 1 评论 0原文

在一个项目中,我必须读取一个文件,并且必须处理文件中的字符数,有没有办法在不逐个字符读取的情况下获取字符数(否则我将不得不读取该文件两次,一次只是为了找到其中的字符数)。

有可能吗?

In a project, I have to read a file, and i have to work with the number of characters in a file, and is there a way to get number of characters without reading it character by character (otherwise i will have to read the file twice, once just to find the number of characters in it).

Is it even possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

染火枫林 2025-01-09 14:38:47

是的。

求到末尾得到末尾的位置即大小。

FILE*  file = fopen("Plop");
fseek(file, 0, SEEK_END);
size_t  size = ftell(file);      // This is the size of the file.
                                 // But note it is in bytes.
                                 // Also note if you are reading it into memory this is
                                 // is the value you want unless you plan to dynamically
                                 // convert the character encoding as you read.

fseek(file, 0, SEEK_SET);        // Move the position back to the start.

在 C++ 中,流具有相同的功能:

std::ifstream   file("Plop");
file.seekg(0, std::ios_base::end);
size_t size = file.tellg();

file.seekg(0, std::ios_base::beg);

Yes.

Seek to the end get the position of the end that is the size.

FILE*  file = fopen("Plop");
fseek(file, 0, SEEK_END);
size_t  size = ftell(file);      // This is the size of the file.
                                 // But note it is in bytes.
                                 // Also note if you are reading it into memory this is
                                 // is the value you want unless you plan to dynamically
                                 // convert the character encoding as you read.

fseek(file, 0, SEEK_SET);        // Move the position back to the start.

In C++ the stream have the same functionality:

std::ifstream   file("Plop");
file.seekg(0, std::ios_base::end);
size_t size = file.tellg();

file.seekg(0, std::ios_base::beg);
如梦 2025-01-09 14:38:47

您可以尝试这样做:

FILE *fp = ... /*open as usual*/;
fseek(fp, 0L, SEEK_END);
size_t fileSize = ftell(fp);

但是,这会返回文件中的字节数,而不是字符数。除非已知编码是每个字符一个字节(例如 ASCII),否则它是不一样的。

了解大小后,您需要将文件“倒带”回开头:

fseek(fp, 0L, SEEK_SET);

You can try this:

FILE *fp = ... /*open as usual*/;
fseek(fp, 0L, SEEK_END);
size_t fileSize = ftell(fp);

However, this returns the number of bytes in the file, not the number of characters. It is not the same unless the encoding is known to be one byte per character (e.g. ASCII).

You'd need to "rewind" the file back to the beginning after you've learned the size:

fseek(fp, 0L, SEEK_SET);
一桥轻雨一伞开 2025-01-09 14:38:47

简单的答案是否定的。更准确地说,它取决于系统:
Unix,这是可能的(例如使用stat);在Windows下,不是
对于文本文件来说是可能的,但是如果您以二进制形式读取文件,
有一个函数GetFileSize可以使用。

尽管不能保证,但在我知道的所有实现下(对于
这两个平台),查找文件末尾,然后执行
ftell,将返回一些内容,当转换为充分的
大整型,将给出与上面相同的结果(使用
相同的限制)。

最后:为什么需要这些信息?如果只是为了分配一个
适当大小的缓冲区,即使是文本文件,GetFileSize(和
tell 查找到最后)将返回一个稍大的值
比您可以读取的字节数。你的缓冲会稍微
过大,但这通常不是问题。

The simple answer is no. More precisely, it's system dependent: under
Unix, it's possible (e.g. using stat); under Windows, it's not
possible for a text file, but if you're reading the file in binary,
there's a function GetFileSize which can be used.

Although not guaranteed, under all of the implementations I know (for
these two platforms), seeking to the end of the file, then doing an
ftell, will return something which, when converted to a sufficiently
large integral type, will give the same results as the above (with the
same restrictions).

Finally: why do you need this information? If it's just to allocate an
appropriately sized buffer, even with a text file, GetFileSize (and
tell after seeking to the end) will return a value slightly larger
than the number of bytes you can read. You're buffer will be slightly
oversized, but this is generally not a problem.

毁梦 2025-01-09 14:38:47

我认为您可能正在寻找动态内存解决方案。您实际上问的是“有没有一种方法可以在不读取文件的情况下获取文件中的字符数?”。答案(假设每个字符一个字节)是肯定的,您可以使用 stat 调用来获取文件大小,文件大小(以字节为单位)是字符数。对于 UTF-8,答案是否定的,但我们暂时把它放在一边,因为刚刚学习的计算机科学家通常不担心国际化。

我认为你想知道有多少个字符的原因是这样你就有足够大的存储空间来容纳所有字符。您不需要知道文件有多大来存储整个内容。

如果您有一个 std::vector,它一开始可以容纳十个字符,然后增长到容纳二十个,然后一万个......当您读完文件时,它会容纳所有的东西,即使你永远不知道会有多少。

I think you are likely looking for a dynamic memory solution. What you actually asked is "is there a way to get the number of characters in a file without reading it?". The answer (assuming one byte per character) is yes, you can use the stat call to get the file size, and the file size in bytes is the number of characters. With UTF-8 the answer is no, but let's put that aside for the moment since just-learning computer scientists usually don't worry about internationalization.

I think the reason you want to know how many characters there are is so that you can have storage big enough to hold them all. You don't need to know how big the file is to store the whole thing.

If you have an std::vector<char>, it can start out able to hold ten characters, then grow to hold twenty, then ten thousand... And when you're done reading the file, it will hold them all, even though you never knew how many there would be.

本宫微胖 2025-01-09 14:38:47

我的脑海中浮现出的就是查看文件大小并将其除以单个字符的字节数?

处理空白和结束线等时会出现问题。

Off the top of my head is so have a look at the file size and divide that by how many bytes a single character is?

Problems arise when dealing with white space and end lines etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文