在Postgresql串扰中,我可以自动化元组部分吗?

发布于 2025-01-17 15:20:22 字数 950 浏览 3 评论 0原文

我试图将一个高表(只有 3 列表示变量、时间戳和值)转换为宽格式,其中时间戳是索引,列是变量名称,值是新表的值。

在 python/pandas 中,这与 PostgreSQL 的情况类似,

import pandas as pd
df = pd.read_csv("./mydata.csv") # assume timestamp, varname & value columns
df.pivot(index="timestamp", columns="varname", values="value")

存在 crosstab,到目前为止我有:

SELECT * FROM crosstab(
  $$
  SELECT
    "timestamp",
    "varname",
    "value"
  FROM mydata
  ORDER BY "timestamp" ASC, "varname" ASC
  $$
) AS ct(
  "timestamp" timestamp,
  "varname1" numeric,
  ...
  "varnameN" numeric
);

问题是我可能有数十到数百个变量名。类型始终是数字,变量名称的数量不稳定(我们可能需要更多变量或意识到其他变量是不必要的)。

有没有一种方法可以自动执行“ct”部分,以便其他一些查询(例如从mydata中选择不同的“varname”)生成它,而不是我必须输入存在的每个变量名称?

PS:PSQL 国内版本为 12.9,生产版本为 14.0。原始表中的行数约为 200 万行,但是我将按时间戳和变量名进行过滤,因此可能只有几十万行。过滤后,我得到了大约 50 个独特的变量名,但这将在几周内增加。

I'm trying to do get a tall table (with just 3 columns indicating variable, timestamp and value) into a wide format where timestamp is the index, the columns are the variable names, and the values are the values of the new table.

In python/pandas this would be something along the lines of

import pandas as pd
df = pd.read_csv("./mydata.csv") # assume timestamp, varname & value columns
df.pivot(index="timestamp", columns="varname", values="value")

for PostgreSQL there exists crosstab, so far I have:

SELECT * FROM crosstab(
  $
  SELECT
    "timestamp",
    "varname",
    "value"
  FROM mydata
  ORDER BY "timestamp" ASC, "varname" ASC
  $
) AS ct(
  "timestamp" timestamp,
  "varname1" numeric,
  ...
  "varnameN" numeric
);

The problem is that I can potentially have dozens to hundreds of variable names. The types are always numeric, number of variable names is not stable (we could need more variables or realize that others are not necessary).

Is there a way to automate the "ct" part so that some other query (e.g. select distinct "varname" from mydata) produces it instead of me having to type in every single variable name present?

PS: PSQL version is 12.9 at home, 14.0 in production. Number of rows in the original table is around 2 million, however I'm going to filter by timestamp and varname, so potentially only a few hundreds of thousands rows. After filtering I got ~50 unique varnames, but that will increase in a few weeks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文