从文件或数据库服务器访问数据更快吗?

发布于 2024-08-19 12:22:54 字数 1431 浏览 8 评论 0 原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

枉心 2024-08-26 12:22:54

我将添加到这取决于人群。

这种问题没有通用答案,但在很大程度上取决于当前的情况。我什至最近将一些数据从 SQL 数据库移至平面文件系统,因为数据库的开销加上一些数据库连接可靠性问题,使得使用平面文件成为更好的选择。

在做出选择时我会问自己的一些问题包括:

  1. 我如何使用数据?例如,我是否只按照输入的顺序从头到尾读取行?或者我将搜索匹配多个条件的行?

  2. 在一个程序执行期间我多久访问一次数据?我会去一次以获得以塞林格为作者的所有书籍,还是会多次去获得几个不同的作者?我是否会多次针对多个不同的条件进行操作?

  3. 我将如何添加数据?我可以在末尾追加一行,这对于我的检索来说是完美的,还是需要重新排序?

  4. 六个月后代码看起来有多合乎逻辑?我强调这一点是因为我认为在设计事物时经常忘记这一点(不仅仅是代码,这匹木马实际上是我当海军时的作品)机械师咒骂机械工程师)。六个月后,当我必须维护您的代码(或者您在完成另一个项目后维护)时,哪种存储和检索数据的方式更有意义。如果从平面文件到数据库的效率提高了 1%,但在必须更新代码时需要花费一周的时间来弄清楚问题,那么您确实改进了事情。

I'll add to the it depends crowd.

This is the kind of question that has no generic answer but is heavily dependent on the situation at hand. I even recently moved some data from a SQL database to a flat file system because the overhead of the DB, combined with some DB connection reliability issues, made using flat files a better choice.

Some questions I would ask myself when making the choice include:

  1. How am I consuming the data? For example will I just be reading from the beginning to the end rows in the order entered? Or will I be searching for rows that match multiple criteria?

  2. How often will I be accessing the data during one program execution? Will I go once to get all books with Salinger as the author or will I go several times to get several different authors? Will I go more than once for several different criteria?

  3. How will I be adding data? Can I just append a row to the end and that's perfect for my retrieval or will it need to be resorted?

  4. How logical will the code look in six months? I emphasize this because I think this is too often forgotten in designing things (not just code, this hobby horse is actually from my days as a Navy mechanic cursing mechanical engineers). In six months when I have to maintain your code (or you do after working another project) which way of storing and retrieving data will make more sense. If going from flat files to a DB results in a 1% efficiency improvement but adds a week of figuring things out when you have to update the code have you really improved things.

等风来 2024-08-26 12:22:54

一般来说,数据库比文件慢。

如果您需要对文件建立索引,那么如果操作正确,自定义索引结构上的硬编码访问路径总是有可能更快。

但是,在选择数据库而不是基于文件的解决方案时,“性能”并不是目标。

您应该问自己,您的系统是否需要数据库提供的任何好处。如果是这样,那么小的性能开销是完全可以接受的。

那么:

  1. 您需要处理多个用户和并发更新吗? (嗯;您确实说过它是静态的。)
  2. 您是否需要灵活性才能轻松地从多个角度查询数据?
  3. 您是否拥有多个用户,并且可以从利用现有安全模型中获益?

基本上,问题是哪个更容易开发。两者之间的性能差异不值得浪费开发时间。

As a general rule, databases are slower than files.

If you require indexing of your files, a hard-coded access path on customised indexing structures will always have the potential to be faster if you do it correctly.

But 'performance' is not the goal when choosing a database over a file based solution.

You should ask yourself whether your system needs any of the benefits that a database would provide. If so, then the small performance overhead is quite acceptable.

So:

  1. Do you need to deal with multiple users and concurrent updates? (Well; you did say it's static.)
  2. Do you need flexibility in order to easily query the data from a variety of angles?
  3. Do you have multiple users, and could gain from making use of an existing security model?

Basically, the question is more of which would be easier to develop. The performance difference between the two is not worth wasting dev time.

吃→可爱长大的 2024-08-26 12:22:54

取决于您的信息是什么以及您的访问模式和规模。关系数据库的两个最大好处是:

  1. 缓存。除非你非常聪明,否则你不可能写出像数据库服务器那样好的缓存

    缓存

  2. 优化器

一样好的缓存。然而,对于某些专门的应用程序,与文件+文件夹数据存储相比,这两个好处都没有体现出来 - 因此答案是响亮的“取决于”。

对于文件/文件夹,技巧是:

  • 缓存经常请求的文件的内容
  • 有小目录(深度嵌套的小目录中的文件比扁平结构中的文件访问速度要快得多,因为读取文件内容需要时间)大目录)。
  • 还有其他更高级的优化(跨磁盘切片、放置在磁盘或不同分区的不同位置等) - 但如果您需要该级别,那么您最好首先使用数据库。

Depends on what your information is and what your access patterns and scale are. Two of the biggest benefits of a relational databases are:

  1. Caching. Unless you're very clever, you can't write a cache as good as that of a DB server

  2. Optimizer.

However, for certain specialized applications, neither of these 2 benefits manifest itself compared to files+folders data store - therefore the answer is a resounding "depends".

As for files/folders, the tricks are:

  • Cache the contents of frequently requested files
  • Have small directories (files in deeply nested small directories are much faster to access than in a flatter structure, due to the time it takes to read the contents of a big directory).
  • There are other, more advanced optimizations (slice across disks, placement on different places in a disk or different partition, etc..) - but if you have need of THAT level, you are better off with a database in the first place.
初与友歌 2024-08-26 12:22:54

根据我的一点经验,与本地文件系统相比,基于服务器的数据库(即使是在本地计算机上提供服务的数据库)往往具有非常慢的吞吐量。然而,这取决于一些事情,其中​​之一就是渐近复杂性。比较扫描大量文件和使用带有索引的数据库来查找项目,数据库胜出。

我的一点经验是关于 PostgreSQL 的。我有一个包含 300 万行的表,而我只更新了 8,000 条记录。花了8秒。

对于“过早优化是万恶之源”这句话,我持保留态度。如果您使用数据库编写应用程序,然后发现它很慢,则可能需要花费大量时间才能切换到基于文件系统的方法或其他方法(例如 SQLite)。我想说,最好的选择是创建一个非常简单的工作负载原型,并使用这两种方法进行测试。我认为在这种情况下知道哪个更快很重要。

From my little bit of experience, server-based databases (even those served on the local machine) tend to to have very slow throughput compared to local filesystems. However, this depends on some things, one of which being asymptotic complexity. Comparing scanning a big list of files against using a database with an index to look up an item, the database wins.

My little bit of experience is with PostgreSQL. I had a table with three million rows, and I went to update a mere 8,000 records. It took 8 seconds.

As for the quote "Premature optimization is the root of all evil.", I would take that with a grain of salt. If you write your application using a database, then find it to be slow, it might take a tremendous amount of time to switch to a filesystem-based approach or something else (e.g. SQLite). I would say your best bet is to create a very simple prototype of your workload, and test it with both approaches. I believe it is important to know which is faster in this case.

望笑 2024-08-26 12:22:54

正如其他人指出的那样:这取决于!

如果您确实需要找出哪种方式更适合您的目的,您可能需要生成一些示例数据以每种格式存储,然后运行一些基准测试。 Benchmark.pm 模块随 Perl 一起提供,使得与类似这样的内容进行并排比较变得相当简单:

use Benchmark qw(:all) ;

my $count = 1000;  # Some large-ish number of trials is recommended.

cmpthese($count, {
    'File System' => sub { ...your filesystem code... },
    'Database'    => sub { ...your database code... }
});

您可以输入 perldoc Benchmark 来获取更完整的文档。

As others have pointed out: it depends!

If you really need to find out which is going to be more performant for your purposes, you may want to generate some sample data to store in each format and then run some benchmarks. The Benchmark.pm module comes with Perl, and makes it fairly simple to do a side-by-side comparison with something like this:

use Benchmark qw(:all) ;

my $count = 1000;  # Some large-ish number of trials is recommended.

cmpthese($count, {
    'File System' => sub { ...your filesystem code... },
    'Database'    => sub { ...your database code... }
});

You can type perldoc Benchmark to get more complete documentation.

帥小哥 2024-08-26 12:22:54

如果站点结构合适,在图像方面使用文件而不是数据库非常有用。创建代表匹配数据的文件夹并将图像放入其中。例如,您有一个文章网站,您将文章存储在数据库中。您不必将图像路径放在数据库上,使用主键命名文件夹(如 1,2,3..)并将图像放入其中。电子书、音乐文件、视频,这种方式可以用于所有媒体文件。如果您不搜索某些内容,则相同的逻辑适用于 xml 文件。

It is very useful to use files instead of db when it comes to images if site structure is suitable. Create folders representing your matching data and place images inside. For example you have an article site, you store your articles in db. You don't have to place your image paths on db, name folders with your primary keys like 1,2,3.. and put images inside. E-books, music files, videos, this approach can be used in all media files. Same logic works with xml files if you won't search for something.

≈。彩虹 2024-08-26 12:22:54

数据库当然可以更快

引用SQLite测试,

SQLite 读取和写入小 blob(例如缩略图)的速度比使用 fread() 或 fwrite() 从磁盘上的单个文件读取或写入相同 blob 的速度快 35%。

此外,保存 10 KB blob 的单个 SQLite 数据库比将 blob 存储在单个文件中使用的磁盘空间大约少 20%。

出现性能差异(我们认为)是因为在 SQLite 数据库中工作时, open() 和 close() 系统调用仅调用一次,而使用时每个 blob 都会调用 open() 和 close() 一次存储在单独文件中的 blob。看来调用 open() 和 close() 的开销大于使用数据库的开销。大小减小的原因在于,单个文件被填充到文件系统块大小的下一个倍数,而 blob 则更紧密地打包到 SQLite 数据库中。

本文中的测量是在 2017 年 6 月 5 日这一周使用 3.19.2 到 3.20.0 之间的 SQLite 版本进行的。您可能期望 SQLite 的未来版本能够表现得更好。

database certainly can be faster,

quoting SQLite test,

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.

The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0. You may expect future versions of SQLite to perform even better.

东京女 2024-08-26 12:22:54

就像其他人所说的那样,数据库是一种工具,它会产生一些开销,但如果您的数据是静态的并且它是只读数据,则从文件读取目录会更快:
以下是我所做的一些测试:
我有文件名为 .csv 的文件
在数据库中,我将列索引为“日期”,以便在数据库中查找相同的记录。每天有 30K-50K 条记录/行和 100 列不同类型的数据(90% 浮点数)。

数据库信息:
PostgreSQL 11.5,16GB 内存

  Table:
    335,162,867 records
    Table size: 110GB
    Index size: 7GB
    Total size: 117GB
  Files:
    Number of files: 8033
    Total Files size: 158GB
    Number of records/lines per file/date: 30K - 50K

不断地从文件中读取随机日期(1986-2019)的数据
比在 PostgreSQL 中读取同一日期的数据快 4-5 倍

Like other said DB is a tool, and it creates some overhead, but in case if your data is static and it's read-only data reading directory from files will be faster:
Here are some tests that I've done:
I had files with the name of the file as .csv
In database I had indexed column as 'date' in order to find the same records in the database. Each day has 30K-50K records/rows and 100 columns of different type of data (90% floats).

DB Info:
PostgreSQL 11.5, 16GB of RAM

  Table:
    335,162,867 records
    Table size: 110GB
    Index size: 7GB
    Total size: 117GB
  Files:
    Number of files: 8033
    Total Files size: 158GB
    Number of records/lines per file/date: 30K - 50K

Reading data for a random date (1986-2019) from a file was constantly
4-5 times faster than reading data for the same date in PostgreSQL

×眷恋的温暖 2024-08-26 12:22:54

这取决于数据的配置文件以及您将使用什么逻辑来访问它。如果您只需要保存和获取命名节点,那么基于文件系统的数据库可能会更快、更高效。 (为此,您还可以查看 Berkeley DB。)如果您需要进行基于索引的搜索,特别是如果您需要根据键连接不同的数据集,那么 SQL 数据库是您的最佳选择。

我会选择对您的应用程序来说最自然的解决方案。

It depends on the profile of the data and what logic you are going to be using to access it. If you simply need to save and fetch named nodes then a filesystem-based database may be faster and more efficient. (You could also have a look at Berkeley DB for that purpose.) If you need to do index-based searches, and especially if you need to join different sets of data based on keys, then an SQL database is your best bet.

I would just go with whatever solution seems the most natural for your application.

挽容 2024-08-26 12:22:54

正如其他人所说,这取决于:数据的大小和性质以及您计划对其运行的操作。

特别是对于 CGI 脚本,在每个页面视图上连接到数据库服务器都会导致性能下降。然而,如果您创建一个简单的基于文件的方法,您可能很容易造成更糟糕的性能问题;-)

除了 Berkeley DB 文件解决方案,您还可以考虑使用 SQLite。这将创建一个存储在本地文件中的数据库的 SQL 接口。您可以使用 DBI 和 SQL 访问它,但没有服务器、配置或网络协议。如果将来需要数据库服务器(例如:如果您决定拥有多个前端服务器,但需要共享状态),这可以使迁移更容易。

在不了解任何细节的情况下,我建议使用 SQLite/DBI 解决方案,然后检查性能。这将提供灵活性、相当简单的启动和良好的性能。

As others have said, it depends: on the size and nature of the data and the operations you're planning to run on it.

Particularly for a CGI script, you're going to incur a performance hit for connecting to a database server on every page view. However if you create a naive file-based approach, you could easily create worse performance problems ;-)

As well as a Berkeley DB File solution you could also consider using SQLite. This creates a SQL interface to a database stored in a local file. You can access it with DBI and SQL but there's no server, configuration or network protocol. This could allow easier migration if a database server is necessary in the future (example: if you decide to have multiple front-end servers, but need to share state).

Without knowing any details, I'd suggest using a SQLite/DBI solution then reviewing the performance. This will give flexibility with a reasonably simple start up and decent performance.

小ぇ时光︴ 2024-08-26 12:22:54

要快速访问文件,根据您正在执行的操作,mmap 可能非常方便。我刚刚在 Effective Perl 博客中写到了这一点,标题为 内存映射文件而不是吞噬它们

但是,我希望数据库服务器会更快。当我们不知道您在做什么、需要访问什么类型的数据等等时,很难说哪种方法对您来说更快。

To quickly access files, depending on what you are doing, an mmap can be very handy. I just wrote about this in the Effective Perl blog as Memory-map files instead of slurping them.

However, I expect that a database server would be much faster. It's difficult to say what would be faster for you when we have no idea what you are doing, what sort of data you need to access, and so on.

甜`诱少女 2024-08-26 12:22:54

我会给你和其他人给你的答案一样的答案
视情况而定

在具有返回数据(只读)的单个服务器的简单场景中,Yes 文件系统将非常出色且易于管理。

但是,当您拥有多个服务器时,您必须管理分布式文件系统,例如 glusterfsceph 等。

数据库是一个为您管理所有分布式文件的工具系统、压缩、读/写、锁等。

希望有帮助。

I'm going to give you the same answer everyone else gave you,
It Depends

In a simple scenario with a single server that returns data (READ Only), Yes file system will be great and easy to manage.

But, when you have more than one server you'll have to manage distributed files system like glusterfs, ceph, etc..

A database is a tool to manage all of it for you, distributed files system, compression, read/write, locks etc..

hope that's helpful.

药祭#氼 2024-08-26 12:22:54

我为我的办公室管理站点(美国和巴西的 100 万员工)选择了文件/文件夹系统,这是我的第一个项目(作为工作)。

事实上,这是我此后做出的最不负责任但幸运的是最合适的决定。

为了处理这个文件系统,我还制作了一个 php 函数集来简化内容。我考虑制作这些功能的事情是:

  • 创建表文件(如csv)
  • 创建包含这些文件的文件夹,
  • 让为选定文件中的每一行重复一个函数,
  • 让迭代更改/删除选定文件中的选定列/行数据
  • 并保存二进制文件作为单独的文件

仅此而已。

但是在使用的时候我不得不考虑这么多的优化。例如,当谈到某人的最后一次见到时......

当某人X用户想要查看其他Ylastseen时,我创建了一个文件夹称为 dataRequestingCluster 并位于 X 下,其中包含 X 希望最后看到的所有内容(如果是,则可能是 X 的好友列表) X 正在查看他的朋友页面),并将该簇添加到 Y 的最后一次查看的表中(其中包含所有“Y”的最后一次查看的表)请求者分离 clusterFiles”并在每次 Y 与服务器交互时更新)

然后添加一种机制,在 3 分钟内不使用时删除 dataRequestClusters (当 X 离开他的朋友页面时不需要更新他最后一次看到的内容)

我必须遵循这种方法,因为更新就时间而言,最后一次出现在 ALL_USERS 文件中完全是个笑话。

最重要的是,这种方法比使用 MySql 数据库系统要快得多...但是你必须动脑筋,多花 30% 的时间,但由此带来的快乐和完整性真是太棒了

这只是“最后看到”的一个案例

很多次我使用不同的算法来解决每个问题,并且它总是比 sql/ db 系统更高效。

结论:文件系统使事情变得复杂,但它是以令人震惊的方式提高质量的唯一方法......如果您不善于思考或时间较少或您对所做的事情不感兴趣你可能应该使用 SQL/DB 系统。

6 个月后,我的服务器数据存储变成了 20GB(没有媒体文件.. 只是用户信息),但任何单个文件从未超过 200kb.. 它以最少的处理时间取得了惊人的成功。

I chose file/folder system for my office-managing-site (1M workers throughout USA and Brazil) which was my first project (as a job).

Actually it was the most irresponsible but luckily most suitable decision I took ever since.

To deal with this file system I made a php function set as well to simplify stuff. Things I consider to make those functions were:

  • Creating table file (like csv)
  • Creating folder containing those files,
  • Letting repeat a function for each row in a selected file,
  • Letting change / delete selected column/row data in a selected file iteratively
  • And save binary files as separate files

That's all.

But when using it I had to think about optimizing so much. For example when it comes to last seen of someone..

When someone X user want to see lastseen of a some other Y I made a folder called dataRequestingCluster and under X which contains everything that X wants to see last seen (maybe X's friendlist if X is viewing his friends page) and added that cluster in to Y's last seen table (which contains all "Y's last seen requesters separate clusterFiles" and updates every time Y interact with server)

And then added a mechanism to remove dataRequestClusters when not used for 3mins (WHEN X Leaves his friends page no need to update last seen for him)

I had to follow this approach because updating last seen in the ALL_USERS File is a complete joke when it comes to time.

Most important thing is this method is way faster than using a database system with MySql ... But you have to use the brain and 30% more time but happiness and completeness ibatined by this is awesome

This is just a one case "last seen"

So many times I used different algorithms to succeed each problem and it was always efficient than sql/ db system.

Conclusion: File systems make stuff complex but its the only way to improve the quality in a shocking way...if you are not good at thinking or have less time or you are not interested about what you do you should probably go SQL/DB system.

After 6 months my server data storage became 20GB (Without media files.. just user info ) but any individual file never exceeded 200kb.. it was amazingly successful with least time for processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文