当前位置：文江博客话题详情

C 编程文件读写技术

发布于 2024-10-05 20:34:53 字数 482 浏览 5 评论 0原文

这是我第一次编写涉及文件读写的程序。实际上我想知道执行此操作的最佳技术是什么。因为当我和同学比较我的作品时，我们的逻辑是非常不同的。

你看，我们的老师要求我们做一个简单的学生名单系统，用户可以在其中添加、编辑和删除记录。他要求我们制作一个文件来保存所有记录，以便我们下次使用该程序时可以访问它。

我解决这个问题的方法是在程序打开菜单之前，我读取里面的所有记录并将其保存在数组[]中。通过这样做，我可以操纵所有记录。然后在用户退出程序之前，我将其保存在同一个文件中并覆盖其中的所有记录。

我同学的解决办法是这样的。当她添加记录时，她访问该文件并附加数据；当她编辑记录时，她访问该文件并编辑特定记录；当她删除记录时，她访问该文件并删除该记录。因此，她所做的就是对于她创建的每个函数，她都在访问该文件。

当然，我们的两项工作都可以编码。但我想知道，如果我们处理数千或数百万条记录，使用哪种方法更高效、更有效。或者还有其他比我们更好的解决方案吗？也许您可以与我们分享您的文件处理经验...谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光的暖冬 2024-10-12 20:34:53

这是您在编程中会一次又一次遇到的经典案例：我是否针对速度或内存使用进行优化？

并且，像所有此类难题一样，没有“正确”答案或完美解决方案。换句话说，你和你的同学解决问题的方法都是正确的。

通过将所有记录加载到内存中的解决方案，您可以“花费”内存，以便在运行时更快地访问和修改每条记录。将所有记录存储在内存中的数组中会占用空间，但由于内存访问几乎无限快于磁盘访问，因此您的方法将比您同学的方法运行得快得多。

相比之下，您的同学通过等待从硬盘按需加载数据来节省 RAM。但这会让她付出代价：与获取内存中已有的数据相比，访问硬盘是一个非常昂贵的过程，而且每次用户进行更改时，她都会陷入困境。考虑一下启动一个程序与切换到已打开的程序需要多长时间。

这就是权衡。这里要问自己的一些重要问题是：

数据集（在您将要处理的常见配置中）是否太大（或将变得太大）而无法完全放入内存？如果您正在处理通常较小的数据集，那么计算机现在拥有足够的 RAM，这可能是值得的。数据集
您需要多快的速度才能访问数据？实时访问重要吗？它是否是一个特别大或复杂的数据集，需要很长时间才能按需从硬盘加载？您的用户期望什么样的性能？
您的应用程序针对什么样的系统？有时嵌入式系统和其他特殊情况需要它们自己独特的设计方法。您可能拥有大量的 RAM 和非常有限的固定存储空间，或者您的情况可能恰恰相反。如果您使用标准的现代 PC 硬件，您的用户想要/需要/已经拥有什么？如果您的大多数目标用户已经在使用相对“强大”的硬件，那么与您的目标是更大的潜在受众相比，您可能会做出不同的设计决策 - 您之前肯定已经通过程序的表达系统看到了这些明确的权衡要求。
您需要考虑特殊情况吗？多个用户并发访问之类的事情使得将所有数据保存在内存中变得更加困难。其他用户如何才能读取仅存储在本地计算机内存中的数据？这里可能需要共享一个公共文件（甚至可能在共享服务器上）。
您的数据中是否有某些部分的访问频率高于其他部分？考虑将这些特定部分始终保留在内存中，并延迟加载其余部分（这意味着，只有当用户访问它们时，您才尝试将它们提取到内存中）。

正如最后一点所暗示的，某种平衡或组合的方法可能最接近“理想”的解决方案。您可以在 RAM 中存储尽可能多的数据，同时在应用程序空闲状态期间定期将任何编辑或修改写回磁盘上的文件。一般程序会花费大量时间等待用户执行某些操作，而不是相反。您可以利用这些空闲的 CPU 周期将内存中保存的内容刷新回磁盘，而不会造成任何明显的速度损失。这种方法在软件开发中一直使用，有助于避免 EClaesson 的答案指出的陷阱。如果您的应用程序崩溃或意外退出，则只有极小部分数据可能会丢失，因为大部分数据已在后台提交到磁盘。

后记：当然，Dark Falcon 的答案是正确的，在生产应用程序中，您很可能会使用数据库之类的东西来处理数据。但由于这似乎是出于教育目的，我认为了解每种方法背后的基本权衡更为重要。

This is a classic case you'll encounter time and time again in programming: do I optimize for speed or memory usage?

And, like all such conundrums, there is no "correct" answer or perfect solution. In other words, you and your classmate are both right in your solutions to the problem.

With your solution of loading all of the records into memory, you "spend" memory in order to make accessing and modifying each of those records faster at run time. Storing all of the records in an array in memory takes up space, but because memory access is almost infinitely faster than disk access, your approach is going to run a lot faster than your classmate's.

By way of contrast, your classmate conserves RAM by waiting to load the data on demand from the hard disk. But that's going to cost her: hitting the hard disk is a terribly expensive process compared to fetching data that's already in memory, and she's going to be stuck doing this each time the user makes a change. Think about how long it takes to start a program versus switching to one that's already open.

And therein lies the tradeoff. Some of the important things to ask yourself here are:

Is the data set (in the common configurations you'll be dealing with) too large (or going to become too large) to fit completely in memory? If you're dealing with typically small sets of data, computers now have enough RAM that it's probably worth it.
How fast do you need to be able to access the data? Is real-time access important? Is it a particularly large or complex data set that would take too long to load from the hard disk on demand? What kind of performance do your users expect?
What kind of system is your application targeting? Sometimes embedded systems and other special cases necessitate their own unique design approaches. You might have an abundance of RAM and very limited amounts of fixed storage, or you might have exactly the opposite. If you're using standard, modern PC hardware, what do your users want/need/already have? If most of your target users are using relatively "beefy" hardware already, you might make different design decisions than if you're aiming to target a larger potential audience—you've surely seen these trade offs made explicit before through a program's expressed system requirements.
Do you need to allow for special situations? Things like concurrent access by multiple users make keeping all of your data in memory much more difficult. How are other users going to be able to read in the data that's only stored in memory on a local computer? Sharing a common file (perhaps even on a shared server) is probably going to be necessary here.
Are there certain portions of your data that are accessed more frequently than others? Consider keeping those specific portions always in memory and lazy-loading the rest (meaning, you only attempt to fetch them into memory when/if they are accessed by the user).

And as that last point hints, something of a balanced or combined approach is probably about as close as you'll come to an "ideal" solution. You could store as much of the data in RAM as possible, while periodically writing any edits or modifications back to the file on disk during your application's idle state. There's plenty of time that the average program spends waiting on the user to do something, as opposed to the other way around. You can take advantage of these idle CPU cycles to flush out things being held in memory back to the disk without incurring any noticeable speed penalty. This approach is used all the time in software development, and helps to avoid the pitfall pointed out by EClaesson's answer. If your application crashes or otherwise quits unexpectedly, only a very small portion of data is likely to be lost because most of it was already committed to disk behind the scenes.

Postscript: Of course, Dark Falcon's answer is correct that in a production application, you would more than likely use something like a database to handle the data. But since this appears to be for educational purposes, I think understanding the basic trade offs behind each approach is far more important.

回复收藏 0 原文

无名指的心愿 2024-10-12 20:34:53

在任何严肃的应用程序中，优秀的程序员可能会使用现有的库来管理数据。选择这个工具取决于具体的要求：

是否需要多个用户同时访问？
是否需要从多台机器访问？

存储大量信息的最常见选择是基于 SQL 的数据库，例如 MySQL、Postgres、Microsoft SQL Server、SQLite 等。这些大多类似于您同学的解决方案，而不是您的解决方案。

回复收藏 0 原文

此刻的回忆 2024-10-12 20:34:53

您的版本（将所有记录保存在内存中）很可能会更快。如果记录数增加，它要求您有足够的内存。这样做的坏处是，程序崩溃或不正确的退出将使您丢失所有数据，因为它们从未保存到文件中。

你的同学版本不会那么快，因为文件 io 不是你能做到的最快的。但它需要更少的内存，并且在崩溃时更安全，因为大多数数据已经在文件中。

回复收藏 0 原文

最近可好 2024-10-12 20:34:53

如果不了解运行该系统的详细信息、数据集的大小以及开发时间与 CPU 时间的相对成本，就无法回答这个问题。如果系统有足够的内存，那么在 RAM 中处理副本可能更好。在 RAM 极其有限的小型系统中（目前主要出现在嵌入式应用程序中），您可能必须更新磁盘文件。其他需要考虑的事情是操作系统在实际写入磁盘之前可能执行的任何缓冲，如果程序崩溃，文件的一致性会发生什么，即使写入磁盘是“昂贵的”，因为它真的很慢或写入周期数量有限（某些闪存盘技术）。

如果这是当今台式计算机上的一个小实际问题，您可能还需要考虑开发各种解决方案所花费的时间，而这些解决方案在小数据集上运行可能需要相对微不足道的时间。

此外，今天最好使用擅长处理相关问题的现有数据库来解决问题，而不是在文件系统中创建自己的数据库。

回复收藏 0 原文