将结果集写入具有排序输出的文件

发布于 2024-07-06 12:25:11 字数 155 浏览 4 评论 0原文

我想将结果集中的“随机”输出(大约 150 万行)以排序的方式放入文件中。 我知道我可以在查询中使用 sort by 命令,但该命令“昂贵”。 您能告诉我是否有任何算法可以在文件中写入结果集行,以便最终对内容进行排序,我可以通过此提高性能吗? 我使用的是java 1.6,查询有多个连接。

I want to put "random" output from my result set (about 1.5 mil rows) in a file in a sorted manner. I know i can use sort by command in my query but that command is "expensive".
Can you tell me is there any algorithm for writing result set rows in a file so the content would be sorted in the end and can i gain in performance with this?
I'm using java 1.6, and query has multiple joins.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烙印 2024-07-13 12:25:11

为表中的排序条件定义一个索引,然后您可以毫无问题地使用 order by 子句并按照来自结果集的方式写入文件。

如果您的查询有多个联接,请为联接和排序条件创建适当的索引。 您可以对程序中的数据进行排序,但会浪费时间。 当学习如何正确调整/使用数据库而不是重新发明数据库引擎中已有的排序算法时,这段时间会更有价值。

获取数据库的分析器并检查查询的执行计划。

Define an index for the sort criteria in your table, then you can use the order by clause without problems and write the file as it comes from the resultset.

If your query has multiple joins, create the proper indexes for the joins and for the sort criteria. You can sort the data on your program but you'd be wasting time. That time will be a lot more valuable when employed learning how to properly tune/use your database rather than reinventing sorting algorithms already present in the database engine.

Grab your database's profiler and check the query's execution plan.

吻安 2024-07-13 12:25:11

根据我的经验,数据库端的排序通常同样快或更快......当然,如果您排序的列已建立索引

In my experience sorting at the database side is usually as fast or faster...certainly if the column you sort on is indexed

挽袖吟 2024-07-13 12:25:11

如果您正在从数据库中读取数据,如果您有适当的索引,那么获取排序的输出应该不会那么“昂贵”。

但是,有时对于复杂的查询,SQL 优化器很难应用索引。 在这种情况下,数据库只是将结果累积在临时表中并透明地为您进行排序。

您不太可能与数据库引擎中的优化级别相匹配; 但是,如果您的问题出现是因为您正在对数据进行一些后处理,从而否定了数据库所做的任何排序,那么除了您自己进行排序之外,您别无选择。

同样,最简单的方法是使用数据库:只需写入具有适当索引的临时表并从那里转储即可。

如果您确定 RAM 中始终能够容纳数据,则可以在内存中对其进行排序。 这是您可能能够击败数据库引擎的唯一情况,只是因为您知道您不需要高清访问。

但这有很多“如果”。 最好留在你的数据库中

If you're reading from a database, getting sorted output shouldn't be so 'expensive' if you have appropriate indexes.

But, sometimes with complex queries it's very hard for the SQL optimiser to apply indexes. In that case, the DB simply accumulates the results in a temporary table and sorts it for you, transparently.

It's very unlikely that you could match the level of optimisations put into your DB engine; but if your problem arises because you're doing some postprocessing of the data that negates any sorting done by the DB, then you have no alternative other than sorting it yourself.

Again, the easiest would be to use the DB: simply write to a temporary table with an appropriate index and dump from there.

If you're certain that the data will always fit in RAM, you can sort it in memory. It's the only case in which you might be able to beat the DB engine, just because you know you won't need HD access.

But that's a lot of 'ifs'. Better stay with your DB

心的憧憬 2024-07-13 12:25:11

如果您需要对数据进行排序,就必须有人来做——要么是您,要么是数据库。 将 ORDER BY 添加到查询中当然更容易。 但你没有理由不能在内存中对其进行排序。 最简单的方法是使用比较器对排序集合(TreeSet、TreeMap)中的数据进行分块,以对所需的列进行排序。 然后将排序后的数据写出。

If you need the data sorted, someone has to do it - either you or the database. It's certainly easier effort-wise to add the ORDER BY to the query. But there's no reason you can't sort it in-memory on your side. The easiest way is to chunk the data in a sorted collection (TreeSet, TreeMap) using a Comparator to sort on the column you need. Then write out the sorted data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文