ResultSet:通过索引检索列值与通过标签检索

发布于 2024-07-07 17:22:53 字数 439 浏览 8 评论 0原文

使用 JDBC 时,我经常遇到这样的结构,就像

ResultSet rs = ps.executeQuery();
while (rs.next()) {
    int id = rs.getInt(1);
    // Some other actions
}

我问自己(以及代码作者)为什么不使用标签来检索列值:

int id = rs.getInt("CUSTOMER_ID");

我听到的最好的解释是关于性能的。 但实际上,它会使处理速度变得非常快吗? 我不相信这一点,尽管我从未进行过测量。 尽管通过标签检索会慢一些,但在我看来,它提供了更好的可读性和灵活性。
那么有人可以给我很好的解释,避免通过列索引而不是列标签检索列值吗? 两种方法的优缺点是什么(也许对于某些 DBMS 而言)?

When using JDBC, I often come across constructs like

ResultSet rs = ps.executeQuery();
while (rs.next()) {
    int id = rs.getInt(1);
    // Some other actions
}

I asked myself (and authors of code too) why not to use labels for retrieving column values:

int id = rs.getInt("CUSTOMER_ID");

The best explanation I've heard is something concerning performance. But actually, does it make processing extremely fast? I don't believe so, though I have never performed measurements. Even if retrieving by label would be a bit slower, nevertheless, it provide better readability and flexibility, in my opinion.
So could someone give me good explanation of avoiding to retrieve column values by column index instead of column label? What are pros and cons of both approaches (maybe, concerning certain DBMS)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

作妖 2024-07-14 17:22:53

警告:我会在这里夸夸其谈,因为这让我发疯。

99%* 的时间里,这是一种荒谬的微观优化,人们有一些模糊的想法让事情变得“更好”。 这完全忽略了这样一个事实:除非您始终处于数百万个 SQL 结果的极其紧张和繁忙的循环中(希望这种情况很少见),否则您永远不会注意到它。 对于每个不这样做的人来说,开发人员维护、更新和修复列索引中的错误所花费的时间成本远远大于性能极其糟糕的应用程序的硬件增量成本。

不要在维护它的人的代码中编写这样的优化代码。 然后观察、测量、分析和优化。 再观察、再测量、再分析、再优化。

优化几乎是开发的最后一步,而不是第一步。

* 图为虚构。

Warning: I'm going to get bombastic here, because this drives me crazy.

99%* of the time, it's a ridiculous micro-optimization that people have some vague idea makes things 'better'. This completely ignores the fact that, unless you're in an extremely tight and busy loop over millions of SQL results all the time, which is hopefully rare, you'll never notice it. For everyone who's not doing that, the developer time cost of maintaing, updating, and fixing bugs in the column indexing are far greater than the incremental cost of hardware for your infinitesimally-worse-performing application.

Don't code optimizations like this in. Code for the person maintaining it. Then observe, measure, analyse, and optimize. Observe again, measure again, analyse again, and optimize again.

Optimization is pretty much the last step in development, not the first.

* Figure is made up.

你的背包 2024-07-14 17:22:53

您应该默认使用字符串标签。

优点:

  • 列顺序的独立性
  • 更好的可读性/可维护性

缺点:

  • 您无法控制列名称(通过存储过程访问)

您更喜欢哪个?

整数?

int i = 1;  
customerId = resultSet.getInt(i++);  
customerName = resultSet.getString(i++);  
customerAddress = resultSet.getString(i++);

还是字符串?

customerId = resultSet.getInt("customer_id");  
customerName = resultSet.getString("customer_name");  
customerAddress = resultSet.getString("customer_address");

如果在位置 1 插入新列怎么办? 您更喜欢哪个代码? 或者,如果列的顺序发生变化,您需要更改哪个代码版本?

这就是为什么您应该默认使用字符串标签。

You should use string labels by default.

Pros:

  • Independence of column order
  • Better readability/maintainability

Cons:

  • You have no control over the column names (access via stored procedures)

Which would you prefer?

ints?

int i = 1;  
customerId = resultSet.getInt(i++);  
customerName = resultSet.getString(i++);  
customerAddress = resultSet.getString(i++);

or Strings?

customerId = resultSet.getInt("customer_id");  
customerName = resultSet.getString("customer_name");  
customerAddress = resultSet.getString("customer_address");

And what if there is a new column inserted at position 1? Which code would you prefer? Or if the order of the columns is changed, which code version would you need to change at all?

That's why you should use string labels by default.

潦草背影 2024-07-14 17:22:53

尽管如此,答案已被接受,这里有一些我尚未看到的附加信息和个人经验。

如果可能的话,一般使用列名(最好是常量而不是文字)。 这更清晰,更容易维护,并且未来的更改不太可能破坏代码。

然而,列索引是有用途的。 在某些情况下,这些速度更快,但不足以覆盖上述名称*的原因。 在开发处理 ResultSet 的工具和通用方法时,这些非常有价值。 最后,可能需要索引,因为列没有名称(例如未命名的聚合)或者存在重复的名称,因此没有简单的方法来引用两者。

*请注意,我已经编写了一些 JDBC 驱动程序,并查看了一些开源驱动程序,这些驱动程序在内部使用列索引来引用结果列。 在我处理过的所有情况下,内部驱动程序首先将列名映射到索引。 因此,您可以很容易地看到,在所有这些情况下,列名总是会花费更长的时间。 但这可能不适用于所有驾驶员。

The answer has been accepted, none-the-less, here is some additional information and personal experience that I have not seen put forward yet.

Use column names (constants and not literals is preferred) in general and if possible. This is both clearer, is easier to maintain, and future changes are less likely to break the code.

There is, however, a use for column indexes. In some cases these are faster, but not sufficiently that this should override the above reasons for names*. These are very valuable when developing tools and general methods dealing with ResultSets. Finally, an index may be required because the column does not have a name (such as an unnamed aggregate) or there are duplicate names so there is no easy way to reference both.

*Note that I have written some JDBC drivers and looked inside some open sources one and internally these use column indexes to reference the result columns. In all cases I have worked with, the internal driver first maps a column name to an index. Thus, you can easily see that the column name, in all those cases, would always take longer. This may not be true for all drivers though.

紫罗兰の梦幻 2024-07-14 17:22:53

来自java文档:

ResultSet 接口提供 getter 方法(getBoolean、getLong 等)用于从当前行检索列值。 可以使用列的索引号或列的名称来检索值。 一般来说,使用列索引会更高效。 列从 1 开始编号。为了获得最大的可移植性,每行中的结果集列应按从左到右的顺序读取,并且每列只能读取一次。

当然,每个方法(命名或索引)都有它的位置。 我同意命名列应该是默认值。 但是,在需要大量循环的情况下,以及在同一代码(或类)部分中定义和维护 SELECT 语句的情况下,索引应该没问题 - 建议列出正在选择的列,而不仅仅是列出“SELECT * FROM...”,因为任何表更改都会破坏代码。

From the java documentation:

The ResultSet interface provides getter methods (getBoolean, getLong, and so on) for retrieving column values from the current row. Values can be retrieved using either the index number of the column or the name of the column. In general, using the column index will be more efficient. Columns are numbered from 1. For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once.

Of course each method (named or indexed) has its place. I agree that named columns should be the default. However, in cases where a huge number of loops are required, and where the SELECT statement is defined and maintained in the same section of code (or class), indexes should be ok - it is advisable to list the columns being selected, not just "SELECT * FROM...", since any table change will break the code.

尸血腥色 2024-07-14 17:22:53

我在 Oracle 数据库上针对这个具体主题做了一些性能分析。 在我们的代码中,我们有一个包含大量列和大量行的结果集。 在执行方法 oracle.jdbc.driver.ScrollableResultSet.findColumn(String name) 的 20 秒 (!) 请求中,大约需要 4 秒。

显然整体设计有问题,但是使用索引而不是列名可能会缩短 4 秒的时间。

I did some performance profiling on this exact subject on an Oracle database. In our code we have a ResultSet with numerous colums and a huge number of rows. Of the 20 seconds (!) the request takes to execute method oracle.jdbc.driver.ScrollableResultSet.findColumn(String name) takes about 4 seconds.

Obviously there's something wrong with the overall design, but using indexes instead of the column names would probably take this 4 seconds away.

风吹过旳痕迹 2024-07-14 17:22:53

您可以两全其美! 使用索引的速度以及使用列名的可维护性和安全性。

首先 - 除非您循环遍历结果集,否则只需使用列名称。

  1. 定义一组整型变量,每个变量对应您将访问的每一列。 变量的名称可以包括列的名称:例如 iLast_Name。

  2. 在结果集循环之前,迭代列元数据并将每个整数变量的值设置为相应列名称的列索引。 如果“Last_Name”列的索引为 3,则将“iLast_Name”的值设置为 3。

  3. 在结果集循环中,使用 GET/SET 方法中的整型变量名称。 变量名称是开发人员/维护人员关于正在访问的实际列名称的视觉线索,但值是列索引,将提供最佳性能。

注意:初始映射(即列名到索引映射)仅在循环之前完成一次,而不是针对循环中的每个记录和列。

You can have the best of both! The speed of using indexes with the maintainability and security of using column names.

First - unless you are looping thru a result set just use column names.

  1. Define a set of integer variables, one for each column you will access. The names of the variables can include the name of the column: e.g. iLast_Name.

  2. Before the result set loop iterate thru the column metadata and set the value of each integer variable to the column index of the corresponding column name. If the index of the 'Last_Name' column is 3 then set the value of 'iLast_Name' to 3.

  3. In the result set loop use the integer variable names in the GET/SET methods. The variable name is a visual clue to the developer/maintainer as to the actual column name being accessed but the value is the column index and will give the best performance.

NOTE: the initial mapping (i.e. column name to index mapping) is only done once before the loop rather than for every record and column in the loop.

夏末 2024-07-14 17:22:53

当然,使用列名可以提高可读性并使维护变得容易。 但使用列名也有不利的一面。 如您所知,SQL 允许多个列名同名,但不能保证您在 resultSet 的 getter 方法中键入的列名实际上指向您要访问的列名。 理论上,使用索引号代替列名是首选,但这会降低可读性。

Sure, using column names increases readability and makes maintenance easy. But using column names has a flipside. As you know, SQL allows multiple column names with same name, there's no guarantee that the column name you typed in the getter method of resultSet actually points to the column name you intend to access. In theory, using index numbers instead of column names is preferred, but it reduces the readability.

葬花如无物 2024-07-14 17:22:53

我认为使用标签不会对性能产生太大影响。 但不使用 String 还有另一个原因。 或者 int ,就此而言。

考虑使用常量。 使用 int 常量使代码更具可读性,但也不太可能出现错误。

除了更具可读性之外,该常量还可以防止您在标签名称中出现拼写错误 - 如果您这样做,编译器将抛出错误。 任何有价值的 IDE 都会选择它。 如果您使用Stringint,则情况并非如此。

I don't think using the labels impacts performance by much. But there is another reason not to use Strings. Or ints, for that matter.

Consider using constants. Using an int constant makes the code more readably, but also less likely to have errors.

Besides being more readable, the constant also prevents you from making typo's in the label names - the compiler will throw an error if you do. And any IDE worth anything will pick it up. This is not the case if you use Strings or ints.

摇划花蜜的午后 2024-07-14 17:22:53

JDBC 驱动程序负责索引查找的列。 因此,如果每次驱动程序进行查找(通常在哈希映射中)时按列名称提取值,以检查列名称的相应索引。

The JDBC driver takes care for the column to index look-up. So if you extract values by column name each time the driver makes a look-up (usually in hash map) to check the corresponding index for the column name.

柳絮泡泡 2024-07-14 17:22:53

我同意之前的答案,即性能并不能迫使我们选择任何一种方法。 最好考虑以下几点:

  • 代码可读性:对于每个阅读代码的开发人员来说,标签比索引更有意义。
  • 维护:考虑 SQL 查询及其维护方式。 在修复/改进/重构 SQL 查询后,您的情况更有可能发生什么:更改提取列的顺序或更改结果列名称。 在我看来,更改提取的列的顺序(作为在结果集中添加/删除新列的结果)发生的可能性更大。
  • 封装:无论您选择哪种方式,都尝试隔离在同一组件中运行 SQL 查询和解析结果集的代码,并仅让该组件了解列名及其到索引的映射(如果您决定使用它们) )。

I agree with previous answers that performance is not something that can force us to select either of the approaches. It would be good to consider the following things instead:

  • Code readability: for every developer reading your code labels have much more sense than indexes.
  • Maintenance: think of the SQL query and the way it is maintained. What is more likely to happen in your case after fixing/improving/refactoring SQL query: changing the order of the columns extracted or changing result column names. It seems for me that changing the order of the columns extracted (as the results of adding/deleting new columns in result set) has greater probability to happen.
  • Encapsulation: in spite of the way you choose try to isolate the code where you run SQL query and parse result set in the same component and make only this component aware about the column names and their mapping to the indexes (if you decided to use them).
奢华的一滴泪 2024-07-14 17:22:53

使用索引是一种优化尝试。

由此节省的时间被开发人员花费额外的精力来查找必要的数据以检查他们的代码在更改后是否能正常工作而浪费了。

我认为使用数字而不是文本是我们的本能。

Using the index is an attempt at optimization.

The time saved by this is wasted by the extra effort it takes the developer to look up the necessary data to check if their code will work properly after the changes.

I think it's our built-in instinct to use numbers instead of text.

梦途 2024-07-14 17:22:53

除了在 Map 中查找标签之外,它还会导致额外的 String 创建。 虽然它会发生在堆栈上,但仍然会带来成本。

这一切都取决于个人选择,到目前为止我只使用了索引:-)

Besides the look up in Map for labels it also leads to an extra String creation. Though it will happens on stack but still it caries a cost with it.

It all depends on the individual choice and till date I have used only indexes :-)

与之呼应 2024-07-14 17:22:53

正如其他发帖者所指出的那样,我会坚持使用列名称,除非您有非常充分的理由不这样做。 与查询优化等相比,对性能的影响可以忽略不计。 在这种情况下,维护比小的优化重要得多。

As it is pointed out by other posters, I would stick to column names unless you have a really powerful reason not to do so. The impact in performance is negligible compared to, for example, query optimization. In this case, maintenance is much more important than an small optmization.

笨死的猪 2024-07-14 17:22:53

其他答案主要关注性能,首先要进行正确性讨论。 这是一个简单的情况,其中列标签不起作用,但列索引起作用:

CREATE TABLE author (
  id BIGINT PRIMARY KEY,
  first_name TEXT, ...
);

CREATE TABLE book (
  id BIGINT PRIMARY KEY,
  author_id BIGINT REFERENCES author,
  title TEXT, ...
);

然后查询:

SELECT *
FROM author
JOIN book ON author.id = book.author_id

对我来说这似乎是日常情况。 代理键、连接,很常见。 在这种情况下,ID 是什么? 您无法通过标签来读取它们。 当然,您可以为列添加别名,或者设计架构以避免任何命名冲突,但谁总是这样做呢?

这取决于这对您来说是否是一个问题,但对于任何重要的 JDBC 使用,您要么创建一个围绕 JDBC 的包装器,要么使用任何通过 JDBC 抽象的现成 API,这不会有这个不再有歧义了。

我已经详细阐述了为什么 jOOQ 在幕后使用列索引 在这个类似问题的答案中。 这个答案还说明了为什么仅出于性能原因,ORM 或 JDBC 包装器(如 jOOQ 或 Hibernate)更适合使用索引而不是标签。

Other answers focused a lot on performance, when there's a correctness discussion to be had, first. Here's a simple case where column labels won't work but column indexes do:

CREATE TABLE author (
  id BIGINT PRIMARY KEY,
  first_name TEXT, ...
);

CREATE TABLE book (
  id BIGINT PRIMARY KEY,
  author_id BIGINT REFERENCES author,
  title TEXT, ...
);

Then query:

SELECT *
FROM author
JOIN book ON author.id = book.author_id

Seems like an every day case to me. Surrogate keys, joins, quite common. What's ID in this case? You can't read them both with labels. Of course, you could alias your columns, or design your schema to avoid any naming conflicts, but who does that all the time?

It depends on whether this is a problem for you, but for any non-trivial JDBC usage, you'll either create a wrapper around JDBC or use any off-the-shelf API that abstracts over JDBC anyway, which won't have this ambiguity anymore.

I've elaborated a bit more on why jOOQ uses column indexes behind the scenes in this answer to a similar question. That answer also shows why ORMs or JDBC wrappers like jOOQ or Hibernate better work with indexes than labels for mere performance reasons.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文