查询大量数据进行网页展示

发布于 2024-11-30 06:42:50 字数 1827 浏览 1 评论 0原文

我是 Oracle 新手(了解一些基本 SQL),但任务不知何故分配给了我。

我的任务是优化查询。它连接一些大表(查询比较耗时)并将结果返回到网页(可以使用分页)。

我的问题是我应该朝什么方向看?如果能提供链接就更好了。

以下是一些详细信息:

采用的优化方法

index on dayNum, timeNum, entityID columns  

行数:

  • time_seq: 100,000
  • Indicator_set: 50,000,000
  • tity amount: 1,000

查询:

select * 
  from time_seq seq,  
       (  
        select entityID, 
               dayNum,
               timeNum,
               sum(ind1) sum_ind1, 
               avg(ind2) avg_ind2, 
               max(ind3) max_ind3  
          from indicator_set  
         group by entityID, dayNum, timeNum 
         -- the dayNum timeNum are grouped to the time sequence standard 
         -- in time_seq table  
        ) sum  
where seq.entityID=sum.entityID  
  and seq.dayNum=sum.dayNum  
  and seq.timeNum=sum.timeNum  

表 time_seq

entityID, dayNum, timeNum
--------------------------------
object1, 20110818, 220000  
object1, 20110818, 223000  
object1, 20100818, 230000  
object1, 20110819, 220000  
object1, 20110819, 223000  
object1, 20100819, 230000 
object2, 20110818, 220000  
object2, 20110818, 223000   
object2, 20100818, 230000  
object2, 20110819, 220000  
object2, 20110819, 223000  
object2, 20100819, 230000  

表格indicator_set

entityID, dayNum, timeNUm, ind1, ind2, ind3
--------------------------------------------
object1, 20110818, 220000, 23,34,23
object1, 20110818, 220500, 23,54,543
object1, 20110818, 220530, 23,54,543
object1, 20110818, 220610, 23,54,543
object1, 20110818, 222900, 23,54,543
...

PS:抱歉,我没有提供表格和查询的确切格式/详细信息。就是太复杂了。

干杯~

I am new to Oracle (know some basic SQL), but the task is somehow assigned to me.

My task is to optimize a query. It joins some large tables (query is time consuming) and returns the result to a webpage (paging can be used).

My question is what direction should I look into? It would be better if links can be provided.

Here is some details:

Optimized methods taken

index on dayNum, timeNum, entityID columns  

Number of rows:

  • time_seq: 100,000
  • indicator_set: 50,000,000
  • tity amount: 1,000

Query:

select * 
  from time_seq seq,  
       (  
        select entityID, 
               dayNum,
               timeNum,
               sum(ind1) sum_ind1, 
               avg(ind2) avg_ind2, 
               max(ind3) max_ind3  
          from indicator_set  
         group by entityID, dayNum, timeNum 
         -- the dayNum timeNum are grouped to the time sequence standard 
         -- in time_seq table  
        ) sum  
where seq.entityID=sum.entityID  
  and seq.dayNum=sum.dayNum  
  and seq.timeNum=sum.timeNum  

Table time_seq

entityID, dayNum, timeNum
--------------------------------
object1, 20110818, 220000  
object1, 20110818, 223000  
object1, 20100818, 230000  
object1, 20110819, 220000  
object1, 20110819, 223000  
object1, 20100819, 230000 
object2, 20110818, 220000  
object2, 20110818, 223000   
object2, 20100818, 230000  
object2, 20110819, 220000  
object2, 20110819, 223000  
object2, 20100819, 230000  

Table indicator_set

entityID, dayNum, timeNUm, ind1, ind2, ind3
--------------------------------------------
object1, 20110818, 220000, 23,34,23
object1, 20110818, 220500, 23,54,543
object1, 20110818, 220530, 23,54,543
object1, 20110818, 220610, 23,54,543
object1, 20110818, 222900, 23,54,543
...

PS: sorry I didn't provide exact format/detail of the table and query. Just too complicated.

cheers~

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

疧_╮線 2024-12-07 06:42:50

您正在如何处理此查询的结果?假设查询返回大约 100,000 行(我假设联接不会消除 TIME_SEQ 中的大量行),则该查询没有多大意义。您通过网页向用户返回 100,000 个无序行 - 人类不可能翻阅 100,000 行,并且以无序方式呈现这些行(或在中间层)。

为了填充网页而尝试动态聚合 5000 万行数据几乎没有意义。这本质上是一个缓慢的操作——如果用户期望在一两秒内得到响应,那么您就没有时间读取和聚合 5000 万行。您可以在 INDICATOR_SET 上创建一个物化视图,用于预聚合数据。这将使插入和更新速度变慢,但会加快查询速度。

当然,即使您将 5000 万行预先聚合为 100,000 行,您仍然尝试连接两个 100,000 行表并向客户端发送 100,000 个无序行,这不太合理。您可以添加一个 ORDER BY 子句,以便行按合理的顺序排列,但这通常会增加查询时间,因为您引入了额外的排序。根据您想要排序的内容,可能有一些方法可以优化它。然而,我仍然会回到一个根本问题,即向人类客户端返回 100,000 行从根本上来说不是正确的方法。

What are you doing with the results of this query? Assuming that the query returns on the order of 100,000 rows (I assume that the join does not eliminate a substantial number of the rows in TIME_SEQ), the query doesn't make a lot of sense. You're returning 100,000 unordered rows to a user via a web page-- there is no way that a human would ever page through 100,000 rows and there is no way that it makes sense to present those rows unordered (or to order them in the middle tier).

It would very rarely make sense to try to aggregate 50 million rows of data on the fly in order to populate a web page. That's going to be an inherently slow operation-- if users are expecting a response in a second or two, you don't have time to read and aggregate 50 million rows. You could potentially create a materialized view on INDICATOR_SET that could be used to pre-aggregate the data. That will make inserts and updates slower but should speed up your query.

Of course, even if you pre-aggregate the 50 million rows down to just 100,000 rows, you're still trying to join two 100,000 row tables and send 100,000 unordered rows to the client which is unlikely to be reasonable. You could potentially add an ORDER BY clause so that the rows come in a reasonable order but that is generally going to add time to the query since you're introducing an extra sort. There may be ways to optimize that depending on what you intend to sort by. However, I'd still come back to the fundamental problem that returning 100,000 rows to a human client is fundamentally not the right approach.

双手揣兜 2024-12-07 06:42:50

在子查询中链接indicator_set 和time_seq 表是否会限制在大型indicator_set 表中分组的行数?

SELECT *
  FROM time_seq seq,
       (  SELECT ind.entityid,
                 ind.dayNum,
                 ind.timeNum,
                 SUM( ind.ind1 ),
                 AVG( ind.ind2 ),
                 MAX( ind.ind3 )
            FROM indicator_set ind,
                 time_seq ts
           WHERE ind.entity_id = ts.entityid
             AND ind.daynum    = ts.daynum
             AND ind.timenum   = ts.timenum
        GROUP BY ind.entityid,
                 ind.daynum,
                 ind.timenum) SUM
 WHERE seq.entityid = SUM.entityid
   AND seq.daynum   = SUM.daynum
   AND seq.timenum  = SUM.timenum

不过,外部查询仍将确保只从 time_seq 表中返回记录。

这有帮助吗?

编辑:顺便说一下,我会使用比 SUM 更好的表别名,因为它是一个 Oracle 函数。

Would linking the indicator_set and time_seq tables within the subquery limit the amount of rows to be grouped in the large indicator_set table?

SELECT *
  FROM time_seq seq,
       (  SELECT ind.entityid,
                 ind.dayNum,
                 ind.timeNum,
                 SUM( ind.ind1 ),
                 AVG( ind.ind2 ),
                 MAX( ind.ind3 )
            FROM indicator_set ind,
                 time_seq ts
           WHERE ind.entity_id = ts.entityid
             AND ind.daynum    = ts.daynum
             AND ind.timenum   = ts.timenum
        GROUP BY ind.entityid,
                 ind.daynum,
                 ind.timenum) SUM
 WHERE seq.entityid = SUM.entityid
   AND seq.daynum   = SUM.daynum
   AND seq.timenum  = SUM.timenum

The outher query would still ensure that only the records from the time_seq table would be returned from the query though.

Is this helpful?

EDIT: By the way, i'd use a better table alias than SUM as it is an oracle function.

诠释孤独 2024-12-07 06:42:50

您可以根据查询创建物化视图:

create materialized view mv as
select entityID, 
               dayNum,
               timeNum,
               sum(ind1) sum_ind1, 
               avg(ind2) avg_ind2, 
               max(ind3) max_ind3  
          from indicator_set  
         group by entityID, dayNum, timeNum;

该表会更小。您也可以在此物化视图中创建索引。您的选择将是这样的:

select * 
  from time_seq seq,  
      mv sum  
where seq.entityID=sum.entityID  
  and seq.dayNum=sum.dayNum  
  and seq.timeNum=sum.timeNum;

您的物化视图可以在快速模式、正常模式等下刷新...如果您想要更多信息,请看一下:
http://download.oracle.com/docs/cd /B13789_01/server.101/b10759/statements_6002.htm

http://orafaq.com/wiki/Oracle_Materialized_Views

You can create a materialized view based on your query:

create materialized view mv as
select entityID, 
               dayNum,
               timeNum,
               sum(ind1) sum_ind1, 
               avg(ind2) avg_ind2, 
               max(ind3) max_ind3  
          from indicator_set  
         group by entityID, dayNum, timeNum;

This table would be smaller. You can create your indexes also in this materialized view. Your select would be something like this:

select * 
  from time_seq seq,  
      mv sum  
where seq.entityID=sum.entityID  
  and seq.dayNum=sum.dayNum  
  and seq.timeNum=sum.timeNum;

Your materialized view could be refresehed in fast mode, normal, etc... If you want some more intormation, take a look:
http://download.oracle.com/docs/cd/B13789_01/server.101/b10759/statements_6002.htm

http://orafaq.com/wiki/Oracle_Materialized_Views

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文