C 编程文件读写技术

发布于 2024-10-05 20:34:53 字数 482 浏览 5 评论 0原文

这是我第一次编写涉及文件读写的程序。实际上我想知道执行此操作的最佳技术是什么。因为当我和同学比较我的作品时,我们的逻辑是非常不同的。

你看,我们的老师要求我们做一个简单的学生名单系统,用户可以在其中添加、编辑和删除记录。他要求我们制作一个文件来保存所有记录,以便我们下次使用该程序时可以访问它。

我解决这个问题的方法是在程序打开菜单之前,我读取里面的所有记录并将其保存在数组[]中。通过这样做,我可以操纵所有记录。然后在用户退出程序之前,我将其保存在同一个文件中并覆盖其中的所有记录。

我同学的解决办法是这样的。当她添加记录时,她访问该文件并附加数据;当她编辑记录时,她访问该文件并编辑特定记录;当她删除记录时,她访问该文件并删除该记录。因此,她所做的就是对于她创建的每个函数,她都在访问该文件。

当然,我们的两项工作都可以编码。但我想知道,如果我们处理数千或数百万条记录,使用哪种方法更高效、更有效。或者还有其他比我们更好的解决方案吗?也许您可以与我们分享您的文件处理经验...谢谢。

It is my first time to create a program with file reading and writing involved. Actually I'm wondering what is the best technique on doing this. Because when I compared my work with my classmate, our logic are very different from each other.

You see, our teacher asked us to do a simple student list system where users can add, edit and delete records. And he required us to make a file to save all the records so that we can access it the next time we use the program.

My solution to this problem is before the program open its menu, i read all the records inside and save it in an array[]. In doing so, i could manipulate all the records. Then before the user exits the program, I save it on the same file and overwrite all the records on it.

My classmate's solution is like this. When she adds a record, she access the file and append the data, when she edits a record, she access the file and edit the particular record, and when she deletes a record she access the file and deletes the record. So what she did is for every function she made, she's accessing the file.

Both of our work is of course, possible to code. But I am wondering, which is more efficient and effective to use if we are dealing with thousands or millions of records. Or are there other solutions better than what we did. Maybe you could share your file handling experiences with us... Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

阳光的暖冬 2024-10-12 20:34:53

这是您在编程中会一次又一次遇到的经典案例:我是否针对速度或内存使用进行优化?

并且,像所有此类难题一样,没有“正确”答案或完美解决方案。换句话说,你和你的同学解决问题的方法都是正确的。

通过将所有记录加载到内存中的解决方案,您可以“花费”内存,以便在运行时更快地访问和修改每条记录。将所有记录存储在内存中的数组中会占用空间,但由于内存访问几乎无限快于磁盘访问,因此您的方法将比您同学的方法运行得快得多。

相比之下,您的同学通过等待从硬盘按需加载数据来节省 RAM。但这会让她付出代价:与获取内存中已有的数据相比,访问硬盘是一个非常昂贵的过程,而且每次用户进行更改时,她都会陷入困境。考虑一下启动一个程序与切换到已打开的程序需要多长时间。

这就是权衡。这里要问自己的一些重要问题是:

  1. 数据集(在您将要处理的常见配置中)是否太大(或将变得太大)而无法完全放入内存?如果您正在处理通常较小的数据集,那么计算机现在拥有足够的 RAM,这可能是值得的。 数据集

  2. 您需要多快的速度才能访问数据?实时访问重要吗?它是否是一个特别复杂的数据集,需要很长时间才能按需从硬盘加载?您的用户期望什么样的性能?

  3. 您的应用程序针对什么样的系统?有时嵌入式系统和其他特殊情况需要它们自己独特的设计方法。您可能拥有大量的 RAM 和非常有限的固定存储空间,或者您的情况可能恰恰相反。如果您使用标准的现代 PC 硬件,您的用户想要/需要/已经拥有什么?如果您的大多数目标用户已经在使用相对“强大”的硬件,那么与您的目标是更大的潜在受众相比,您可能会做出不同的设计决策 - 您之前肯定已经通过程序的表达系统看到了这些明确的权衡要求。

  4. 您需要考虑特殊情况吗?多个用户并发访问之类的事情使得将所有数据保存在内存中变得更加困难。其他用户如何才能读取仅存储在本地计算机内存中的数据?这里可能需要共享一个公共文件(甚至可能在共享服务器上)。

  5. 您的数据中是否有某些部分的访问频率高于其他部分?考虑将这些特定部分始终保留在内存中,并延迟加载其余部分(这意味着,只有当用户访问它们时,您才尝试将它们提取到内存中)。

正如最后一点所暗示的,某种平衡或组合的方法可能最接近“理想”的解决方案。您可以在 RAM 中存储尽可能多的数据,同时在应用程序空闲状态期间定期将任何编辑或修改写回磁盘上的文件。一般程序会花费大量时间等待用户执行某些操作,而不是相反。您可以利用这些空闲的 CPU 周期将内存中保存的内容刷新回磁盘,而不会造成任何明显的速度损失。这种方法在软件开发中一直使用,有助于避免 EClaesson 的答案指出的陷阱。如果您的应用程序崩溃或意外退出,则只有极小部分数据可能会丢失,因为大部分数据已在后台提交到磁盘。

后记:当然,Dark Falcon 的答案是正确的,在生产应用程序中,您很可能会使用数据库之类的东西来处理数据。但由于这似乎是出于教育目的,我认为了解每种方法背后的基本权衡更为重要。

This is a classic case you'll encounter time and time again in programming: do I optimize for speed or memory usage?

And, like all such conundrums, there is no "correct" answer or perfect solution. In other words, you and your classmate are both right in your solutions to the problem.

With your solution of loading all of the records into memory, you "spend" memory in order to make accessing and modifying each of those records faster at run time. Storing all of the records in an array in memory takes up space, but because memory access is almost infinitely faster than disk access, your approach is going to run a lot faster than your classmate's.

By way of contrast, your classmate conserves RAM by waiting to load the data on demand from the hard disk. But that's going to cost her: hitting the hard disk is a terribly expensive process compared to fetching data that's already in memory, and she's going to be stuck doing this each time the user makes a change. Think about how long it takes to start a program versus switching to one that's already open.

And therein lies the tradeoff. Some of the important things to ask yourself here are:

  1. Is the data set (in the common configurations you'll be dealing with) too large (or going to become too large) to fit completely in memory? If you're dealing with typically small sets of data, computers now have enough RAM that it's probably worth it.

  2. How fast do you need to be able to access the data? Is real-time access important? Is it a particularly large or complex data set that would take too long to load from the hard disk on demand? What kind of performance do your users expect?

  3. What kind of system is your application targeting? Sometimes embedded systems and other special cases necessitate their own unique design approaches. You might have an abundance of RAM and very limited amounts of fixed storage, or you might have exactly the opposite. If you're using standard, modern PC hardware, what do your users want/need/already have? If most of your target users are using relatively "beefy" hardware already, you might make different design decisions than if you're aiming to target a larger potential audience—you've surely seen these trade offs made explicit before through a program's expressed system requirements.

  4. Do you need to allow for special situations? Things like concurrent access by multiple users make keeping all of your data in memory much more difficult. How are other users going to be able to read in the data that's only stored in memory on a local computer? Sharing a common file (perhaps even on a shared server) is probably going to be necessary here.

  5. Are there certain portions of your data that are accessed more frequently than others? Consider keeping those specific portions always in memory and lazy-loading the rest (meaning, you only attempt to fetch them into memory when/if they are accessed by the user).

And as that last point hints, something of a balanced or combined approach is probably about as close as you'll come to an "ideal" solution. You could store as much of the data in RAM as possible, while periodically writing any edits or modifications back to the file on disk during your application's idle state. There's plenty of time that the average program spends waiting on the user to do something, as opposed to the other way around. You can take advantage of these idle CPU cycles to flush out things being held in memory back to the disk without incurring any noticeable speed penalty. This approach is used all the time in software development, and helps to avoid the pitfall pointed out by EClaesson's answer. If your application crashes or otherwise quits unexpectedly, only a very small portion of data is likely to be lost because most of it was already committed to disk behind the scenes.

Postscript: Of course, Dark Falcon's answer is correct that in a production application, you would more than likely use something like a database to handle the data. But since this appears to be for educational purposes, I think understanding the basic trade offs behind each approach is far more important.

无名指的心愿 2024-10-12 20:34:53

在任何严肃的应用程序中,优秀的程序员可能会使用现有的库来管理数据。选择这个工具取决于具体的要求:

  1. 是否需要多个用户同时访问?
  2. 是否需要从多台机器访问?

存储大量信息的最常见选择是基于 SQL 的数据库,例如 MySQL、Postgres、Microsoft SQL Server、SQLite 等。这些大多类似于您同学的解决方案,而不是您的解决方案。

In any serious application, a good programmer would probably use an existing library to manage the data. Choosing this tool depends on the exact requirements:

  1. Does it need to be accessed concurrently by multiple users?
  2. Does it need to be accessed from multiple machines?

The most common choice for storing a significant amount of information would be a SQL-based database, such as MySQL, Postgres, Microsoft SQL Server, SQLite, etc. These mostly resemble your classmate's solution more than yours.

此刻的回忆 2024-10-12 20:34:53

您的版本(将所有记录保存在内存中)很可能会更快。如果记录数增加,它要求您有足够的内存。这样做的坏处是,程序崩溃或不正确的退出将使您丢失所有数据,因为它们从未保存到文件中。

你的同学版本不会那么快,因为文件 io 不是你能做到的最快的。但它需要更少的内存,并且在崩溃时更安全,因为大多数数据已经在文件中。

Your version (keeping all records in memory) will most probably be faster. It requires that you have enough memory if the record count grows though. The bad thing with this is that a program crash or uncorrect exit will make you loose all data as it was never saved to a file.

Your classmates version will not be as fast, as file io isn't the fastest you can do. But it will require less memory and is more safe at crashes as most of the data will already be in the file.

最近可好 2024-10-12 20:34:53

如果不了解运行该系统的详细信息、数据集的大小以及开发时间与 CPU 时间的相对成本,就无法回答这个问题。如果系统有足够的内存,那么在 RAM 中处理副本可能更好。在 RAM 极其有限的小型系统中(目前主要出现在嵌入式应用程序中),您可能必须更新磁盘文件。其他需要考虑的事情是操作系统在实际写入磁盘之前可能执行的任何缓冲,如果程序崩溃,文件的一致性会发生什么,即使写入磁盘是“昂贵的”,因为它真的很慢或写入周期数量有限(某些闪存盘技术)。

如果这是当今台式计算机上的一个小实际问题,您可能还需要考虑开发各种解决方案所花费的时间,而这些解决方案在小数据集上运行可能需要相对微不足道的时间。

此外,今天最好使用擅长处理相关问题的现有数据库来解决问题,而不是在文件系统中创建自己的数据库。

This is a question that cannot be answered without knowing the details of the system on which it is to run, the size of the data set, and the relative cost of development time vs. cpu time. If the system has sufficient memory, working on a copy in ram is probably preferable. In a small system with extremely limited ram (today found mostly in embedded applications) you may have to update the disk file. Other things to think about are any buffering that the operating system may do before actual writing to the disk, what happens with consistency in the file if the program crashes, and even if writing to the disk is "expensive" either because it's really slow or has a limited number of write cycles (some flash disk technologies).

If this were a small practical problem on today's desktop computers you might also want to consider the time spent developing various solutions against the relatively insignificant time they might take to run on small data sets.

Also, today it might be better to solve the problem using an existing database that's good at handling the relevant issues rather than making your own database in the file system.

独行侠 2024-10-12 20:34:53

如果记录的大小不固定,则就地编辑记录会很微妙。只有使用二进制格式并支持将行标记为未使用(例如,使用外部索引或使用白色)才真正可能。文件系统不是原子的,因此您无法确定您所做的事情是否完整地存储在磁盘上。

这使得问题比学生笔记应用程序的其余部分更加复杂,并且最好委托给数据库(SQLite 和 TokyoCabinet 是一些更轻量级的数据库)。如果您无法使用数据库,请采用简单的实现。它将有更少的错误,而且当需要用数据库替换它时,您也不会执着于它。因此,读取内存中整个文件的方法听起来是最好的选择。

Editing records in place is subtle if they aren't of fixed size. It is only really possible with a binary format and support for marking a row as unused (for example, with an outside index or with whiteouts). Filesystems aren't atomic, so you can't be sure that what you did ends up on disk in its entirety.

This makes the problem way more complex than the rest of your student notes application, and best delegated to a database (SQLite and TokyoCabinet are some of the more lightweight). If you can't use a database, go with a simple implementation. It will have fewer bugs, and you won't get attached when the time comes to replace it with a database. So, your approach of reading the whole file in memory sounds like the best choice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文