如何在不遍历内容的情况下查找文件中的字符数
在一个项目中,我必须读取一个文件,并且必须处理文件中的字符数,有没有办法在不逐个字符读取的情况下获取字符数(否则我将不得不读取该文件两次,一次只是为了找到其中的字符数)。
有可能吗?
In a project, I have to read a file, and i have to work with the number of characters in a file, and is there a way to get number of characters without reading it character by character (otherwise i will have to read the file twice, once just to find the number of characters in it).
Is it even possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
是的。
求到末尾得到末尾的位置即大小。
在 C++ 中,流具有相同的功能:
Yes.
Seek to the end get the position of the end that is the size.
In C++ the stream have the same functionality:
您可以尝试这样做:
但是,这会返回文件中的字节数,而不是字符数。除非已知编码是每个字符一个字节(例如 ASCII),否则它是不一样的。
了解大小后,您需要将文件“倒带”回开头:
You can try this:
However, this returns the number of bytes in the file, not the number of characters. It is not the same unless the encoding is known to be one byte per character (e.g. ASCII).
You'd need to "rewind" the file back to the beginning after you've learned the size:
简单的答案是否定的。更准确地说,它取决于系统:
Unix,这是可能的(例如使用
stat
);在Windows下,不是对于文本文件来说是可能的,但是如果您以二进制形式读取文件,
有一个函数
GetFileSize
可以使用。尽管不能保证,但在我知道的所有实现下(对于
这两个平台),查找文件末尾,然后执行
ftell
,将返回一些内容,当转换为充分的大整型,将给出与上面相同的结果(使用
相同的限制)。
最后:为什么需要这些信息?如果只是为了分配一个
适当大小的缓冲区,即使是文本文件,
GetFileSize
(和tell
查找到最后)将返回一个稍大的值比您可以读取的字节数。你的缓冲会稍微
过大,但这通常不是问题。
The simple answer is no. More precisely, it's system dependent: under
Unix, it's possible (e.g. using
stat
); under Windows, it's notpossible for a text file, but if you're reading the file in binary,
there's a function
GetFileSize
which can be used.Although not guaranteed, under all of the implementations I know (for
these two platforms), seeking to the end of the file, then doing an
ftell
, will return something which, when converted to a sufficientlylarge integral type, will give the same results as the above (with the
same restrictions).
Finally: why do you need this information? If it's just to allocate an
appropriately sized buffer, even with a text file,
GetFileSize
(andtell
after seeking to the end) will return a value slightly largerthan the number of bytes you can read. You're buffer will be slightly
oversized, but this is generally not a problem.
我认为您可能正在寻找动态内存解决方案。您实际上问的是“有没有一种方法可以在不读取文件的情况下获取文件中的字符数?”。答案(假设每个字符一个字节)是肯定的,您可以使用 stat 调用来获取文件大小,文件大小(以字节为单位)是字符数。对于 UTF-8,答案是否定的,但我们暂时把它放在一边,因为刚刚学习的计算机科学家通常不担心国际化。
我认为你想知道有多少个字符的原因是这样你就有足够大的存储空间来容纳所有字符。您不需要知道文件有多大来存储整个内容。
如果您有一个
std::vector
,它一开始可以容纳十个字符,然后增长到容纳二十个,然后一万个......当您读完文件时,它会容纳所有的东西,即使你永远不知道会有多少。I think you are likely looking for a dynamic memory solution. What you actually asked is "is there a way to get the number of characters in a file without reading it?". The answer (assuming one byte per character) is yes, you can use the
stat
call to get the file size, and the file size in bytes is the number of characters. With UTF-8 the answer is no, but let's put that aside for the moment since just-learning computer scientists usually don't worry about internationalization.I think the reason you want to know how many characters there are is so that you can have storage big enough to hold them all. You don't need to know how big the file is to store the whole thing.
If you have an
std::vector<char>
, it can start out able to hold ten characters, then grow to hold twenty, then ten thousand... And when you're done reading the file, it will hold them all, even though you never knew how many there would be.我的脑海中浮现出的就是查看文件大小并将其除以单个字符的字节数?
处理空白和结束线等时会出现问题。
Off the top of my head is so have a look at the file size and divide that by how many bytes a single character is?
Problems arise when dealing with white space and end lines etc.