在Postgresql串扰中，我可以自动化元组部分吗？

发布于 2025-01-17 15:20:22 字数 950 浏览 3 评论 0原文

我试图将一个高表（只有 3 列表示变量、时间戳和值）转换为宽格式，其中时间戳是索引，列是变量名称，值是新表的值。

在 python/pandas 中，这与 PostgreSQL 的情况类似，

import pandas as pd
df = pd.read_csv("./mydata.csv") # assume timestamp, varname & value columns
df.pivot(index="timestamp", columns="varname", values="value")

存在 crosstab，到目前为止我有：

SELECT * FROM crosstab(
  $$
  SELECT
    "timestamp",
    "varname",
    "value"
  FROM mydata
  ORDER BY "timestamp" ASC, "varname" ASC
  $$
) AS ct(
  "timestamp" timestamp,
  "varname1" numeric,
  ...
  "varnameN" numeric
);

问题是我可能有数十到数百个变量名。类型始终是数字，变量名称的数量不稳定（我们可能需要更多变量或意识到其他变量是不必要的）。

有没有一种方法可以自动执行“ct”部分，以便其他一些查询（例如从mydata中选择不同的“varname”）生成它，而不是我必须输入存在的每个变量名称？

PS：PSQL 国内版本为 12.9，生产版本为 14.0。原始表中的行数约为 200 万行，但是我将按时间戳和变量名进行过滤，因此可能只有几十万行。过滤后，我得到了大约 50 个独特的变量名，但这将在几周内增加。

原文

I'm trying to do get a tall table (with just 3 columns indicating variable, timestamp and value) into a wide format where timestamp is the index, the columns are the variable names, and the values are the values of the new table.

In python/pandas this would be something along the lines of

import pandas as pd
df = pd.read_csv("./mydata.csv") # assume timestamp, varname & value columns
df.pivot(index="timestamp", columns="varname", values="value")

for PostgreSQL there exists crosstab, so far I have:

SELECT * FROM crosstab(
  $
  SELECT
    "timestamp",
    "varname",
    "value"
  FROM mydata
  ORDER BY "timestamp" ASC, "varname" ASC
  $
) AS ct(
  "timestamp" timestamp,
  "varname1" numeric,
  ...
  "varnameN" numeric
);

The problem is that I can potentially have dozens to hundreds of variable names. The types are always numeric, number of variable names is not stable (we could need more variables or realize that others are not necessary).

Is there a way to automate the "ct" part so that some other query (e.g. select distinct "varname" from mydata) produces it instead of me having to type in every single variable name present?

PS: PSQL version is 12.9 at home, 14.0 in production. Number of rows in the original table is around 2 million, however I'm going to filter by timestamp and varname, so potentially only a few hundreds of thousands rows. After filtering I got ~50 unique varnames, but that will increase in a few weeks.

分享到QQ

分享到微博