如何在数据库通信中有效地处理 BLOB 和数值数据?

发布于 2024-07-06 20:04:26 字数 430 浏览 3 评论 0原文

SQL 数据库似乎是大多数软件的基石。 然而,它似乎针对文本数据进行了优化。 事实上,在执行任何涉及数字数据(特别是整数)的查询时,数字在应用程序和数据库之间转换为文本,然后返回本机格式似乎效率很低。 同样的低效率似乎也适用于 BLOB 数据。 我的理解是,即使使用像 Linq to SQL 这样的东西,这种两种方式的转换也是在后台发生的。

有没有通用的方法可以使用 SQL 绕过这种开销? 是否有某些数据库管理系统比其他系统(即使用非标准扩展/API)更有效地处理这个问题?

澄清。 在下面的 select 语句中,IN 之后的数字列表可以更容易地作为原始 int 数组传递,但似乎没有办法实现该优化级别。

SELECT foo FROM bar WHERE baz IN (23, 34, 45, 9854004, ...)

SQL databases seem to be the cornerstone of most software. However, it seems optimized for textual data. In fact when doing any queries involving numerical data, integers specifically, it seems inefficient that the numbers are getting converted to text and then back to native formats both ways between the application and the database. This same inefficiency seems to apply to BLOB data as well. My understanding is that even with something like Linq to SQL, this two way conversion is occuring in the background.

Are there general ways to bypass this overhead with SQL? Are there certain database management systems that handle this more efficiently than others (ie, with non-standard extensions/API's)?

Clarification. In the following select statement, the list of numbers after IN could be more easily passed as a raw array of int, but there seems to be no way of achieving that optimization level.

SELECT foo FROM bar WHERE baz IN (23, 34, 45, 9854004, ...)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

揽清风入怀 2024-07-13 20:04:26

别以为。 措施。

格式转换不太可能成为数据库工作的可衡量成本,除非您滥用数据库作为算术引擎。

LOB 的 IO 成本(尤其是具有字符转换的 CLOBS)可能会变得很大; 一旦您知道可能有效的最简单方法实际上会对性能产生显着影响,这里的补救措施就是尽量减少复制 LOB 数据的次数。 使用任何允许您在其创建或使用点与数据库之间直接传输数据的 SQL 参数绑定样式 — 通常这是将 LOB 绑定到流或 I/O 通道。

但是,除非您有办法衡量影响,并且测量结果表明这是您的瓶颈,否则不要这样做。

Don't suppose. Measure.

Format conversion is not likely to be a measurable cost for database work, unless you are misusing the database as an arithmetic engine.

The IO cost for LOBs, especially for CLOBS with character conversion, can become significant; the remedy here, once you know that the simplest thing that might work actually has a noticeable performance impact, is to minimize the number of times you copy the LOB data. Use whatever SQL parameter binding style allows you to transfer the data directly between its point of creation or use, and the database -- often this is binding the LOB to a stream or I/O channel.

But don't do this until you have a way to measure the impact, and have measurements showing that this is your bottleneck.

挖鼻大婶 2024-07-13 20:04:26

数据库中的数字数据不存储为文本。 我想这取决于数据库,但它肯定不是必须的,也不是。

BLOB 完全按照你设置的方式存储——根据定义,数据库无法解释这些信息——我想如果它发现有用的话它可以压缩。 BLOB 不会转换为文本。

以下是 Oracle 存储数字的方式:

http://下载.oracle.com/docs/cd/B28359_01/server.111/b28318/datatype.htm#i16209

内部数字格式

Oracle 数据库以可变长度格式存储数值数据。 每个值都以科学计数法存储,其中 1 个字节用于存储指数,最多 20 个字节用于存储尾数。 结果值的精度限制为 38 位。 Oracle 数据库不存储前导零和尾随零。 例如,数字412的存储格式类似于4.12 x 102,其中1个字节用于存储指数(2),2个字节用于存储尾数的三位有效数字(4,1,2)。 负数在其长度中包含符号。

MySQL信息在这里:

http://dev.mysql.com/ doc/refman/5.0/en/numeric-types.html

查看表格 - TINYINT 以 1 个字节表示(范围 -128 - 127),如果存储为文本则不可能。

编辑:经过澄清——我会说使用你的语言中的 API,看起来像这样(伪代码),

stmt = conn.Prepare("SELECT * FROM TABLE where x in (?, ?, ?)");
stmt.SetInt(0, x);
stmt.SetInt(1, y);
stmt.SetInt(2, z);

我不相信底层协议使用文本来传输参数。

Numerical data in a database is not stored as text. I guess it depends on the database, but it certainly doesn't have to be and isn't.

BLOBs are stored exactly how you set them -- by definition, the DB has no way to interpret the information -- I guess it could compress if it found that to be useful. BLOBs are not translated into text.

Here's how Oracle stores numbers:

http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/datatype.htm#i16209

Internal Numeric Format

Oracle Database stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle Database does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.

MySQL info here:

http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html

Look at the table -- a TINYINT is represented in 1 byte (range -128 - 127), not possible if stored as text.

EDIT: With the clarification -- I would say use the API in your language that looks something like this (pseudocode)

stmt = conn.Prepare("SELECT * FROM TABLE where x in (?, ?, ?)");
stmt.SetInt(0, x);
stmt.SetInt(1, y);
stmt.SetInt(2, z);

I don't believe that the underlying protocols use text for the transport of parameters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文