从文件中获取信息而不遍历其内容
这个问题让我搜索在不遍历文件内容的情况下还能从文件中获得什么(意味着不使用 ifstream 或 getc 等输入内容)。
除了文件大小和字符数之外,我还可以收集哪些其他信息?我搜索了 fseek
,我发现我可以使用SEEK_SET
、SEEK_CUR
和SEEK_END
,它们只允许我找到文件结尾、文件开头和当前指针。
为了使它成为一个问题,我特别想问:
- 可以计算某些字符或字符类型(换行符等)的出现次数吗?
- 其内容是否可以与某个模板相匹配?
- 使用这些方法是否比多次读取文件更快?
我问的是 Microsoft Windows,而不是 Linux。
This question made me search for what else can I get from a file without traversing its contents (means without inputting the contents using ifstream or getc etc).
Other than file size and number of characters, what other information can I gather? I searched fseek
, I found I can use SEEK_SET
, SEEK_CUR
and SEEK_END
, which only allow me to find the end of the file, start of the file and current pointer.
In order to make it a question, I specifically want to ask:
- Can occurrences of some character or type of character (newline etc) be counted?
- Can its contents be matched with a certain template?
- Is using these methods faster than reading the file multiple times?
And I am asking about Microsoft Windows, not Linux.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
1) 不,因为在不可预测的条件下搜索某些内容需要彻底检查内容。考试就是读书。当然,你之前可能会收集一些统计数据,但是你需要遍历你的数据不少于一次。您可以使用其他应用程序隐式执行此操作,但它们也会从头到尾遍历您的文件。您可以以某种方式组织您的文件,以通过最少的读取操作来获取必要的信息,但这完全取决于您的任务,并且没有通用的方法(因为任何慷慨都会检查整个源结构)。
2) 也否(见上文)
3) 是。在内存中存储尽可能多的内容(或任务所需的内容)(这称为缓存)。例如,使用映射(请参阅 MapViewOfFile(Windows 上的 MapViewOfFile 和 *nix 系统上的 mmap(2)),这使用了一些系统内缓存机制。
1) No, becuase searching of something in unpredicteble conditions requires thorough examing of contents. Examing is reading. Of course, you may collect some statistics before, but you need to traverse you data not less then once. You can use other applications to do this implicitly, but they also will traverse your file from very begining to the end. You may orginize your file some way to obtain necessary info with minimal amount of read-operations, but its all up to your task, and there is no general approach (Because any generiosuty comes to examing the whole source structure).
2) Also No (see above)
3) Yes. Store as much as possible (or required by task) in memory (that's called caching). For example, use mapping (See MapViewOfFile for Windows and mmap(2) on *nix systems), this uses some in-system caching mechanism.
这里没有奇迹。前一个问题有一个“快捷方式”,因为文件中的字符数等于其字节大小(更严格地说 - ansi-text 文件被认为是一个字符序列,每个字符都由一个字节表示)。
There're no miracles here. The former question had a "shortcut" because the number of characters in the file equals to its size in bytes (more strictly speaking - the ansi-text file is considered of a character sequence, each is represented by a single byte).
stat
结构< /a> 包含有关文件的信息,包括权限、所有权、大小、访问权限和创建日期信息。至于元数据,也许有一个 API 可以与 Windows 搜索数据库相结合,允许根据其他条件进行搜索,例如内容属性(我通常是 Linux 人员,所以我不知道 Windows 在这方面提供了什么)。The
stat
structure contains information about the file, including permissions, ownership, size, access and creation date info. As for metadata, maybe there's an API to tie into a Windows search database that might allow searching on other criteria, like content attributes (I'm a Linux guy, usually, so I don't know what Windows offers in this respect).