何时使用嵌入式数据库

发布于 2024-09-06 19:46:01 字数 370 浏览 5 评论 0原文

我正在编写一个应用程序，它解析一个大文件，生成大量数据并用它进行一些复杂的可视化。由于所有这些数据无法保存在内存中，因此我做了一些研究，并开始考虑将嵌入式数据库作为这些数据的临时容器。

我的问题是：这是解决这个问题的传统方法吗？嵌入式数据库（除了结构化数据）是否应该通过仅在内存中保留一个子集（如缓存）来管理数据，而其余部分则保存在磁盘上？谢谢。

编辑：澄清一下：我正在编写一个桌面应用程序。该应用程序将输入一个大小为 100 Mb 的文件。读取文件后，应用程序将生成大量可视化图表。由于图表可能具有大量节点，因此它们可能无法装入内存。我是否应该将它们保存到嵌入式数据库中，该数据库将只负责将相关数据保留在内存中？（嵌入式数据库可以做到这一点吗？），或者我应该编写自己的复杂模块来做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

日久见人心 2024-09-13 19:46:01

很难回答的问题 - 但我会分享我的经验，让你决定是否有帮助。

如果您需要保留处理源文件的输出，并使用它来生成派生数据的多个视图，那么您可能考虑使用嵌入式数据库。使用嵌入式数据库的原因（恕我直言）：

利用 RDBMS 功能（ACID、关系、外键、约束、触发器、聚合...）
更轻松地以灵活的方式导出数据
允许访问将处理后的数据发送到外部客户端（已知格式）
准备查看时可以更灵活地转换数据

做出决定时应考虑的因素：

目标平台是什么（Windows、Linux、Android、iPhone、PDA））？
有什么技术基础？（Java、.Net、C、C++...）
预期或需要设计哪些资源限制？（RAM、CPU、硬盘空间）
需要考虑哪些操作行为（连接网络、断开连接）？

在典型的现代台式机上，有足够的备用容量来处理大多数操作。在 eeePC、PDA 和其他便携式设备上，可能不会。在嵌入式设备上，很可能不会。您使用的语言可能具有内置功能来帮助内存管理 - 也许您可以利用这些功能。连接方面（有状态/无状态等）可能会影响在任何给定点您真正需要保留在内存中的量。

如果您正在处理非常大的文件，那么您可能会考虑使用流处理方法，这样您一次只在内存中保存整个数据的一小部分 - 但这并不意味着您应该这样做（或不应该）使用嵌入式数据库。直接文本或二进制文件也可以工作（基于记录、基于列、基于行......等等）。

某些数据库将允许您在存储数据后以更有效的方式与数据进行交互 - 这取决于引擎。我发现，如果您的基本文件（我指的是您最初从原始源生成的文件）中需要大量聚合，那么 RDBMS 引擎对于简化您的逻辑非常有帮助。其他选项包括构建基本转换，然后添加额外的步骤以将其处理到每个特定视图的其他临时存储中，然后依次处理这些转换以呈现为目标（报告？）格式。

只是意识流反应 - 希望能有所帮助。

编辑：

根据您的进一步说明，我不确定嵌入式数据库是您想要采取的方向。您要么需要对渲染图形进行某种简化假设，要么研究分段等方法（渲染图形的部分，然后在渲染下一部分之前缓存输出）。

Tough question - but I'll share my experience and let you decide if it helps.

If you need to retain the output from processing the source file, and you use that to produce multiple views of the derived data, then you might consider using an embedded database. The reasons to use an embedded database (IMHO):

To take advantage of RDBMS features (ACID, relationships, foreign keys, constraints, triggers, aggregation...)
To make it easier to export the data in a flexible manner
To enable access to your processed data to external clients (known format)
To allow more flexible transformation of the data when preparing for viewing

Factors which you should consider when making the decision:

What is the target platform(s) (windows, linux, android, iPhone, PDA)?
What technology base? (Java, .Net, C, C++, ...)
What resource constraints are expected or need to be designed for? (RAM, CPU, HD space)
What operational behaviours do you need to take into account (connected to network, disconnected)?

On the typical modern desktop there is enough spare capacity to handle most operations. On eeePCs, PDAs, and other portable devices, maybe not. On embedded devices, very likely not. The language you use may have build in features to help with memory management - maybe you can take advantage of those. The connectivity aspect (stateful / stateless / etc.) may impact how much you really need to keep in memory at any given point.

If you are dealing with really big files, then you might consider a streaming process approach so you only have in memory a small portion of the overall data at a time - but that doesn't really mean you should (or shouldn't) use an embedded database. Straight text or binary files could work just as well (record based, column based, line based... whatever).

Some databases will allow you more effective ways to interact with the data once it is stored - it depends on the engine. I find that if you have a lot of aggregation required in your base files (by which I mean the files you generate initially from the original source) then an RDBMS engine can be very helpful to simplify your logic. Other options include building your base transform and then adding additional steps to process that into other temporary stores for each specific view, which are then in turn processed for rendering to the target (report?) format.

Just a stream-of-consciousness response - hope that helps a little.

Edit:

Per your further clarification, I'm not sure an embedded database is the direction you want to take. You either need to make some sort of simplifying assumptions for rendering your graphs or investigate methods like segmentation (render sections of the graph and then cache the output before rendering the next section).

回复收藏 0 原文

~没有更多了~