适合 Delphi、Firemonkey 存储 20 GB 的数据库

发布于 2024-12-29 13:33:37 字数 687 浏览 1 评论 0原文

我没有数据库开发经验,因此我需要您的建议来选择可在 Firemonkey 中使用的数据库。

我需要存储 html 文件(现在没有媒体,但它们可以有),它们的总大小约为 20 GB(未压缩文本)。主要功能必须是最大程度地快速搜索数据库中的文本,并且必须能够实现人工搜索(如谷歌)。另外,还可以进行压缩(20 GB 存储量太大。如果压缩导致搜索速度变慢,则不需要)。

什么样的数据库适合我的关注? 非常感谢您的建议!

编辑

要求:

  1. 价格:免费
  2. 位置:本地或远程
  3. 操作系统支持: Windows
  4. 系统要求:具有以下功能的数据库占地面积大 (希望换取更好的性能)
  5. 性能:快速文本搜索
  6. 并发用户:20
  7. 全文索引和搜索:人类(类似Google)快速地 需要文本搜索
  8. 可管理性: 并不重要

我知道一个在线网络法律数据库可以在几毫秒内搜索 100 GB 信息中的单词。我需要同样的性能,并且需要类似Google的搜索。

I don't have experience in database development, so I need your suggestions in choosing of a database that can be used in Firemonkey.

I need to store html files (without media now, but they can be with), their total size is around 20 GB (uncompressed text). A main feature must be maximally fast searching of text in the database, and it must be possible to implement human searching (like google). Plus, there can be compression (20 GB is to much to store. If compression makes searching slow it's not required).

What kind of databases are appropriate for my concern?
Thanks a lot for your suggestions!

Edited

Requirements:

  1. Price: Free
  2. Location: local or remote
  3. Operating system support: Windows
  4. System requirements: a database with a large footprint
    (hopefully in exchange of better performances)
  5. Performances: fast text searching
  6. Concurrent users: 20
  7. Full text indexing and searching: human (Google-like) fast
    text searching is required
  8. Manageability: doesn't matter much

I know an on-line web legal database that can search words through 100 GB of information in milliseconds. I need the same performance, and Google-like searching is required.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

剑心龙吟 2025-01-05 13:33:37

Delphi 数据库访问层与 FireMonkey 分离,它与 VCL 使用的相同(尽管 FM AFAIK 仅依赖 LiveBindings 来访问数据,但这在您的情况下不是问题)。

今天20GB确实不算什么数据。如果配置正确,几乎任何数据库都可以轻松处理它们。选择什么引擎取决于:

  • 价格:您打算花多少钱?
  • 位置:您需要本地数据库(同一台机器)还是远程数据库(LAN 或 WAN)?
  • 操作系统支持:应该在哪个操作系统上运行?
  • 系统要求:您是否需要一个占用空间较小的数据库,或者您可以使用一个较大的数据库(希望能够换取更好的性能)?
  • 表演:需要哪些表演?
  • 并发用户数:有多少用户同时连接数据库?
  • 全文索引和搜索:并非所有数据库都提供现成的
  • 可管理性:某些数据库可能比其他数据库需要更多的管理。

目前还没有“一个数据库适合所有情况”。

Delphi database access layer is separated from FireMonkey, it's the same used by VCL (although FM AFAIK relies only on LiveBindings to access data, but that's not an issue in your case).

Today 20 GB are really not much data. Almost any database will handle them without much effort if properly configured. What engine to choose depends on:

  • Price: how much are you going to spend for it?
  • Location: do you need a local database (same machine) or a remote one (LAN or WAN)?
  • Operating system support: which OS should it run on?
  • System requirements: do you need a database with a small footprint, or you can use one with a larger one (hopefully in exchange of better performances)?
  • Performances: what are the required performances?
  • Concurrent users: how much user will connect to the database concurrently?
  • Full text indexing and searching: not all databases offer it out of the box
  • Manageability: some databases may require more management than others.

There is no "one database fits all" yet.

姐不稀罕 2025-01-05 13:33:37

我不是 DBA,所以我不能直接说,而且老实说,我不确定是否有人可以直接回答这个问题,因为它只是取决于场景的问题之一。

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

这是比较功能的一个很好的起点和平台兼容性。我认为这里要考虑的主要问题是运行它的硬件以及如何最好地利用它来完成手头的任务。

如果您有一个服务器场,请确保您的数据库支持分布和某种负载平衡(据我了解,大多数在某种程度上都支持)。

为了加快搜索速度,除非您编写一个自定义算法来以某种方式搜索压缩版本,否则我认为您会希望保持数据未压缩。搜索压缩数据实际上可能会更快。如果您能够使用压缩文件的索引来与纯文本搜索参数进行比较,那么只需查找索引中匹配的键即可。如果在索引中发现任何内容,请检查压缩数据中是否有它们。如果没有大量的自定义代码,我还没有听说过任何数据库支持搜索压缩文本的想法(尽管我在这一点上很容易出错)。

如果在搜索之前需要解压缩整个数据集,则速度很可能会慢得多(与 CPU 时间相比,内存相对便宜)。看来 Firemonkey 可供使用的数据库选择有限,因此这也将有助于缩小您的选择范围。

根据您编辑的问题,我建议编写(或查找)一个解析器或正则表达式,以从您希望可搜索的 HTML 中提取所有重要元素。然后将它们与在 HTML 中找到它们的位置的引用一起存储在数据库中。就类似 Google 的搜索而言,如果您指的是如何纠正拼写错误和使用同义词,您可能需要某种自定义代码来进行拼写字典查找和同义词库查找。我相信任何现代数据库中的全文搜索都可以处理在 where 子句中使用 LIKE 或类似语句进行查询的需要。

看起来 ldsandon 的答案无论如何都涵盖了大部分内容。太长了;如果没有,感谢您的阅读。

I'm no DBA so I can't say directly, and honestly I'm not sure that any one person could give a direct answer to this question as it's one of those it just depends scenarios.

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

That's a good starting point to compare features and platform compatibility. I think the major thing to consider here is what hardware will be running it and how can you best utilize that to accomplish the task at hand.

If you have a server farm being sure your DB supports distribution and some sort of load balancing (most do to some degree from what I understand).

To speed up searching unless you code up a custom algorithm that searches the compressed version somehow I think you're going to want to keep the data un-compressed. Searching the compressed data actually might be faster. If you're able to use the index for the compressed file to compare against your plain text search parameters then are just looking for those keys that were matched within the index. If any are found in the index check for them within the compressed data. Without tons of custom code I haven't heard of any DB that supports this idea of searching compressed text (though I could easily be wrong on this point).

If the entire data set needs to be decompressed before doing the search it will very likely be much slower (memory is relatively cheap compared to CPU time). It looks like Firemonkey has a limited selection of DBs to use so that will help to narrow your choices down as well.

What I would suggest based on your edited question, is to write (or find) a parser or regular expression to extract all the important elements from the HTML that you would like to be searchable. Then store those in a database along with a reference for where they were found in the HTML. In terms of Google like searching, if you mean in terms of how it can correct misspellings and use synonyms, you probably need some sort of custom code to do dictionary look ups for spelling and thesaurus look ups for synonyms. I believe full text searching in any modern DB will handle the need to query with LIKE or similar statements in the where clause.

Looks like ldsandon's answer covers most of this anyhow. TLDR; if not thanks for reading.

怂人 2025-01-05 13:33:37

我会推荐 PostgreSQL 来完成这项任务。它具有良好的性能,并内置全文搜索功能,可进行类似Google的搜索。而且它是免费且开源的。

不幸的是,Delphi 并未附带现成的 Postgres 数据访问组件。您可以通过 ODBC 连接,也可以购买可用的组件,例如 DevartDA-SoftmicroOLAP< /a>.

I would recommend PostgreSQL for this task. It has good performance, and built in full text search capability for Google-like searching. And it's free and open source.

Unfortunately Delphi doesn't come with Postgres data access components out of the box. You can connect by ODBC, or you can purchase components available from, for example, Devart, DA-Soft or microOLAP.

美胚控场 2025-01-05 13:33:37

您考虑过 NoSQL 数据库吗?维基百科文章解释了它们与 SQL 数据库的差异,并提到它们适合用作文档存储。

http://en.wikipedia.org/wiki/NoSQL

本文列出了大约 12 个实现文档存储类,很多都是开源的。 (Jackrabbit、CouchDB、MongoDB)。

Stackoverflow 上的这个问题包含一些指向 Delphi 客户端的指针:

Delphi 和 NoSQL

我还会考虑在应用程序上进行缓存服务器,以加快搜索速度。当然还有文本索引解决方案,例如 Apache Lucene

Have you considered NoSQL databases? The Wikipedia article explains their differences to SQL databases and also mentions that they are suited as document store.

http://en.wikipedia.org/wiki/NoSQL

The article lists around twelve implementations in the document store category, many are open source. (Jackrabbit, CouchDB, MongoDB).

This question on Stackoverflow contains some pointers to Delphi clients:

Delphi and NoSQL

I would also consider caching on the application server, to speed up search. And of course a text indexing solution like Apache Lucene.

习惯成性 2025-01-05 13:33:37

我会选择 Microsoft SQL Server Express Edition。我认为 2008 R2 是最新的稳定版本,但还有 Denali (2011)。它符合您的所有标准。

您可以使用 ADO 来处理。

I would take Microsoft SQL Server Express Edition. I think 2008 R2 is latest stable version but there is also Denali (2011). It match all criterien you have.

You can use ADO to work with.

空城仅有旧梦在 2025-01-05 13:33:37

尝试 Advantage 数据库服务器。

它易于管理和配置。
类似 dbase 的数据管理语言和 SQL 数据管理语言。
快速索引全文搜索功能。
此外,还有来自开发人员本身的无与伦比的支持。

本地服务器(独立版本,与基于网络的服务器相对)是免费的。

devzone.advantagedatabase.com

Try the Advantage Database Server.

It's easy to manage and configure.
Both dbase-like and SQL data management languages.
Fast indexed full text search capabilities.
Plus, unparalled support from the developers themselves.

The local server (stand-alone version, as opposed to the network based server) is free.

devzone.advantagedatabase.com

终止放荡 2025-01-05 13:33:37

根据其文档,有一个带有全文搜索功能的 Firebird 版本 - http://www.red-soft .biz/en/document_21 - 它使用流行的搜索引擎 Apache Lucene

There is a Firebird version with full text search according to its documentation - http://www.red-soft.biz/en/document_21 - it uses Apache Lucene, a popular search engine

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文