从文件中获取信息而不遍历其内容

发布于 2025-01-02 14:52:55 字数 625 浏览 0 评论 0原文

这个问题让我搜索在不遍历文件内容的情况下还能从文件中获得什么(意味着不使用 ifstream 或 getc 等输入内容)。

除了文件大小和字符数之外,我还可以收集哪些其他信息?我搜索了 fseek,我发现我可以使用SEEK_SETSEEK_CURSEEK_END,它们只允许我找到文件结尾、文件开头和当前指针。

为了使它成为一个问题,我特别想问:

  1. 可以计算某些字符或字符类型(换行符等)的出现次数吗?
  2. 其内容是否可以与某个模板相匹配?
  3. 使用这些方法是否比多次读取文件更快?

我问的是 Microsoft Windows,而不是 Linux。

This question made me search for what else can I get from a file without traversing its contents (means without inputting the contents using ifstream or getc etc).

Other than file size and number of characters, what other information can I gather? I searched fseek, I found I can use SEEK_SET, SEEK_CUR and SEEK_END, which only allow me to find the end of the file, start of the file and current pointer.

In order to make it a question, I specifically want to ask:

  1. Can occurrences of some character or type of character (newline etc) be counted?
  2. Can its contents be matched with a certain template?
  3. Is using these methods faster than reading the file multiple times?

And I am asking about Microsoft Windows, not Linux.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

永言不败 2025-01-09 14:52:55

1) ,因为在不可预测的条件下搜索某些内容需要彻底检查内容。考试就是读书。当然,你之前可能会收集一些统计数据,但是你需要遍历你的数据不少于一次。您可以使用其他应用程序隐式执行此操作,但它们也会从头到尾遍历您的文件。您可以以某种方式组织您的文件,以通过最少的读取操作来获取必要的信息,但这完全取决于您的任务,并且没有通用的方法(因为任何慷慨都会检查整个源结构)。

2) 也(见上文)

3) 。在内存中存储尽可能多的内容(或任务所需的内容)(这称为缓存)。例如,使用映射(请参阅 MapViewOfFile(Windows 上的 MapViewOfFile 和 *nix 系统上的 mmap(2)),这使用了一些系统内缓存机制。

1) No, becuase searching of something in unpredicteble conditions requires thorough examing of contents. Examing is reading. Of course, you may collect some statistics before, but you need to traverse you data not less then once. You can use other applications to do this implicitly, but they also will traverse your file from very begining to the end. You may orginize your file some way to obtain necessary info with minimal amount of read-operations, but its all up to your task, and there is no general approach (Because any generiosuty comes to examing the whole source structure).

2) Also No (see above)

3) Yes. Store as much as possible (or required by task) in memory (that's called caching). For example, use mapping (See MapViewOfFile for Windows and mmap(2) on *nix systems), this uses some in-system caching mechanism.

香草可樂 2025-01-09 14:52:55
  1. 取决于是否实际需要多次读取文件。

这里没有奇迹。前一个问题有一个“快捷方式”,因为文件中的字符数等于其字节大小(更严格地说 - ansi-text 文件被认为是一个字符序列,每个字符都由一个字节表示)。

  1. No
  2. No
  3. Depends on wether there's an actual need to read the file multiple times.

There're no miracles here. The former question had a "shortcut" because the number of characters in the file equals to its size in bytes (more strictly speaking - the ansi-text file is considered of a character sequence, each is represented by a single byte).

漆黑的白昼 2025-01-09 14:52:55

The stat structure contains information about the file, including permissions, ownership, size, access and creation date info. As for metadata, maybe there's an API to tie into a Windows search database that might allow searching on other criteria, like content attributes (I'm a Linux guy, usually, so I don't know what Windows offers in this respect).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文