如何为关联表生成随机数据？

发布于 2025-01-26 06:07:16 字数 1380 浏览 3 评论 0原文

我正在使用三张表的电子购物商店工作：products，atributes和product_atributes。我已经填充了表products和atributes：

insert into atribute(name)
select random_atribute()
from generate_series(1,10000) as seq(i);

insert into product(price)
select floor(1 + random() * 100)::int
from generate_series(1,20000) as seq(i);

我想生成随机数据 table product_atributes：是1个产品的1个属性，但也只有1个产品的所有属性。结果应该看起来像：

product_atributes
id  product_id     atribute_id
1      5               12
2      5               76
3      5               10
4      5                7
5      1               45
6      1               109
...

Table product_atributes是这样创建的：

create table product_atributes(
  id serial primary key,
  atribute_id integer references atribute(id),
  product_id integer references product(id)
);

我尝试了STH。这样，但它行不通：

with data as (
    select  s.i,
            random_atribute_id() as atribute_id,
            s.id as product_id
   
    from (generate_series(1, 1000) as seq(i)
             cross join lateral (select seq.i, * from product order by random() limit 1) as s)
)

insert into product_atributes(atribute_id, product_id)
select atribute_id, product_id from data;

我该怎么做？

原文

I am working on an e-shop with three tables: products, atributes and product_atributes. I have already filled tables products and atributes:

insert into atribute(name)
select random_atribute()
from generate_series(1,10000) as seq(i);

insert into product(price)
select floor(1 + random() * 100)::int
from generate_series(1,20000) as seq(i);

I want to generate random data for the table product_atributes: There can be 1 attribute for 1 product but also all attributes for just 1 product. The result should look like:

product_atributes
id  product_id     atribute_id
1      5               12
2      5               76
3      5               10
4      5                7
5      1               45
6      1               109
...

Table product_atributes is created like this:

create table product_atributes(
  id serial primary key,
  atribute_id integer references atribute(id),
  product_id integer references product(id)
);

I tried sth. like this, but it's not working:

with data as (
    select  s.i,
            random_atribute_id() as atribute_id,
            s.id as product_id
   
    from (generate_series(1, 1000) as seq(i)
             cross join lateral (select seq.i, * from product order by random() limit 1) as s)
)

insert into product_atributes(atribute_id, product_id)
select atribute_id, product_id from data;

How can I do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹短裙飘 2025-02-02 06:07:16

简单地

插入所有可能的组合，均匀的机会为1％：

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM   product p
JOIN   atribute a ON random() >= 0.99;

更柔和＆amp;更快的速度

可提供对最小和最大属性数量的更多控制。

对于 对于您的特定设置使用 gap-less 属性ID（从1到10000）：

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM  (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS  JOIN LATERAL (
   SELECT DISTINCT 1 + trunc(random() * 10000) AS atribut_id
   FROM   generate_series (0, p.ct) g
   ) a;

这会生成0至10个完全随机的关联。

2 + trunc（Random（） * 6）:: int as Ct将生成3至8个关联。（请注意0基于0的generate_series（）！）

在子查询中生成属性（ct）的数量，以避免对tandom（Random）进行单一评估（）导致所有产品的数字相同。

我们可以相当多地优化无差距ID的性能。而不是为给定的ID范围生成随机数，而不是在所有产品中浏览所有10000个属性。更快。
将不同的扔掉以消除（不太可能）重复。我们的10000中的10个几乎没有关系，但我们不能允许重复。（因此，在极少数情况下，可以减少关联的可能性。）

我们甚至可以处理一些差距。选择随机行有很多微妙之处。参阅：

请

您还可以使用数据改装CTE 。 1：n关系的简单示例：

我如何生成遵守我架构的主要键 - 前键约束的伪造数据？

您的情况相同的原则，JUT两个ctes ...

Simple

To insert every possible combination with an even chance of 1 percent:

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM   product p
JOIN   atribute a ON random() >= 0.99;

More sophisitcated & faster

This offers more control over minimum and maximum number of attributes.

And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM  (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS  JOIN LATERAL (
   SELECT DISTINCT 1 + trunc(random() * 10000) AS atribut_id
   FROM   generate_series (0, p.ct) g
   ) a;

This generates 0 to 10 completely random associations per product.
2 + trunc(random() * 6)::int AS ct would generate 3 to 8 associations. (Note the 0-based generate_series()!)

Generate the number of attributes (ct) in a subquery to avoid single evaluation of random() resulting in the same number for all products.

We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.
Throw in DISTINCT to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)

We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See: