隐藏重复的列单元格

发布于 2025-02-01 21:49:12 字数 1348 浏览 5 评论 0 原文

首先,在这里根本无法解决我的问题。我尝试了 this 也是如此。

我想生成随机的唯一假数据(名字和地址)

我使用以下SQL请求:

CREATE OR REPLACE VIEW bah AS
SELECT DISTINCT A.VAL AS VORNAME,
    B.VAL AS ADRESSE
FROM ANON.FIRST_NAME A,
    ANON.ADDRESS B
    GROUP BY
    A.val,b.val
    HAVING COUNT(*) = 1;

结果显然看起来像这样(地址也重复了下一个名字): 错误”

如何使名字和地址独特?

请注意,这是CSV格式:

预期结果:

"Abbas","Dippelstr. 736, 23892 Hainichen",
"Gilda","Noackallee 6/2, 24711 Malchin",
"Guenter","Fredy-Junk-Gasse 3, 90438 Schmölln",
"Hans-Ulrich","Karl-Peter-Kambs-Gasse 996, 15869 Sulzbach-Rosenberg"

示例数据:

Column1 (contains oids (which is irrelevant) and vals(relevant! hence listed below):
"Abbas", "Ante","Anthony"... 
Column2: (see above - oids and vals) 
"Benthinstraße 31, 35994 Kleve", "Cordula-Bachmann-Ring 4/8, 06292 Neustrelitz", "Danny-Fischer-Weg 8/9, 28346 Rastatt", "Eckbauergasse 157, 10570 Rudolstadt"

First of all, this here doesn't solve my problem at all. I tried this too.

I want generate random unique fake data (first name and address)

I used the following SQL request:

CREATE OR REPLACE VIEW bah AS
SELECT DISTINCT A.VAL AS VORNAME,
    B.VAL AS ADRESSE
FROM ANON.FIRST_NAME A,
    ANON.ADDRESS B
    GROUP BY
    A.val,b.val
    HAVING COUNT(*) = 1;

The result apparently looked like this (the addresses repeat themselves for the next name too):
what is wrong

How do I make the first names AND addresses unique?

Please note this is in CSV format:

Expected result:

"Abbas","Dippelstr. 736, 23892 Hainichen",
"Gilda","Noackallee 6/2, 24711 Malchin",
"Guenter","Fredy-Junk-Gasse 3, 90438 Schmölln",
"Hans-Ulrich","Karl-Peter-Kambs-Gasse 996, 15869 Sulzbach-Rosenberg"

Sample Data:

Column1 (contains oids (which is irrelevant) and vals(relevant! hence listed below):
"Abbas", "Ante","Anthony"... 
Column2: (see above - oids and vals) 
"Benthinstraße 31, 35994 Kleve", "Cordula-Bachmann-Ring 4/8, 06292 Neustrelitz", "Danny-Fischer-Weg 8/9, 28346 Rastatt", "Eckbauergasse 157, 10570 Rudolstadt"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吖咩 2025-02-08 21:49:12

交叉加入是错误的方法。从 comma()分开的多个是交叉加入的。请参阅:

anon.first_name中的10个名称? /code>和10个地址 anon.address 表格> 100行,每个名称和地址乘以交叉加入表的基数。正是你不想要的。所有与重复的混淆都是由此造成的。在之后应用不同的 交叉连接效率极低。

要仅使用每个名称和地址一次,请将一个随机(或任意)编号附加到每一侧,然后加入。

假设(由于缺乏声明)每个源表都有不同的条目。 (否则,您需要带有不同的的子查询才能首先崩溃 - 窗口函数,例如 row_number() distient 之前应用在相同的查询级别上。)

SELECT *
FROM  (
   SELECT row_number() OVER () AS arbitrary_nr, val AS vorname
   FROM   anon.first_name
   ) a
JOIN (
   SELECT row_number() OVER () AS arbitrary_nr, val AS adresse
   FROM   anon.address
   ) b USING (arbitrary_nr)

组合是任意,而不是随机。请参阅:

结果具有较小表的基数。跳过较大桌子的多余行被跳过。

为了允许一些重复,您只需在一个(或两个)侧引入一些重复的数字( nuterary_nr )具有相同的任意号码。为您的示例:

有时可能有2个人住在同一地址

SELECT *
FROM  (
   SELECT row_number() OVER (ORDER BY val) AS arbitrary_nr, val AS vorname
   FROM   anon.first_name
   UNION ALL
   SELECT row_number() OVER (ORDER BY val) - 1, val
   FROM   anon.first_name
   WHERE  random() > .9
   ) a
JOIN (
   SELECT row_number() OVER () AS arbitrary_nr, val AS adresse
   FROM   anon.address
   ) b USING (arbitrary_nr)

这第二次增加了所有名字的约10%。 (因此有些名称获得了两个地址。)这次,名称按字母顺序排列。随机示例以0( -1 )开头,数字只能增加速度较慢,因此相同的名称永远无法获得相同的 nutyary_nr ,并且永远不会合并相同的地址两次同名。

只要至少一侧获得任意(或随机)数字,结果仍然是任意(或随机)。

有许多的方式。很大程度上取决于您的确切输入以及结果的确切要求。

A CROSS JOIN is the wrong approach to begin with. Multiple FROM items separated by comma (,) are cross-joined. See:

10 names in anon.first_name and 10 addresses in anon.address form a Cartesian product of 100 rows, with each name and address multiplied by the cardinality of the cross-joined table. Exactly what you don't want. All the confusion with duplicates was created by that. Applying DISTINCT after the cross join is hugely inefficient.

To use each name and address only once, attach a random (or arbitrary) number to each side and join on that.

Assuming (for lack of declaration) that each source table has distinct entries. (Else you need subqueries with DISTINCT to collapse dupes first - window functions like row_number() are applied before DISTINCT on the same query level.)

SELECT *
FROM  (
   SELECT row_number() OVER () AS arbitrary_nr, val AS vorname
   FROM   anon.first_name
   ) a
JOIN (
   SELECT row_number() OVER () AS arbitrary_nr, val AS adresse
   FROM   anon.address
   ) b USING (arbitrary_nr)

Combinations are arbitrary, rather than random. See:

The result has the cardinality of the smaller table. Excess rows from the bigger table are skipped.

To allow some duplicates, you just introduce some duplicate numbers (arbitrary_nr) on one (or both) side(s), either with the same row multiple times, or different rows with the same arbitrary number. For your example:

There could be like 2 persons living at the same address sometimes

SELECT *
FROM  (
   SELECT row_number() OVER (ORDER BY val) AS arbitrary_nr, val AS vorname
   FROM   anon.first_name
   UNION ALL
   SELECT row_number() OVER (ORDER BY val) - 1, val
   FROM   anon.first_name
   WHERE  random() > .9
   ) a
JOIN (
   SELECT row_number() OVER () AS arbitrary_nr, val AS adresse
   FROM   anon.address
   ) b USING (arbitrary_nr)

This adds ~ 10 % of all names a second time. (So some names get two addresses.) This time, names are ordered alphabetically. The random sample starts with 0 (- 1) and numbers can can only increase slower, so the same name can never get the same arbitrary_nr, and the same address is never combined with the same name twice.

The result is still arbitrary (or random) as long as at least one side gets arbitrary (or random) numbers.

There are many ways. Much depends on your exact input, and exact requirements for the result.

别闹i 2025-02-08 21:49:12

您正在使用在此处选择“不同的错误”。

为了达到预期的结果,您需要像这样修改查询:

select distinct on (ADRESSE) * from bah;

这样您的结果行将是唯一的。

You are using select distinct wrong here.

To achieve the expected result you need to modify your query like so:

select distinct on (ADRESSE) * from bah;

That way your result rows will be unique.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文