隐藏重复的列单元格
首先,在这里根本无法解决我的问题。我尝试了 this 也是如此。
我想生成随机的唯一假数据(名字和地址)
我使用以下SQL请求:
CREATE OR REPLACE VIEW bah AS
SELECT DISTINCT A.VAL AS VORNAME,
B.VAL AS ADRESSE
FROM ANON.FIRST_NAME A,
ANON.ADDRESS B
GROUP BY
A.val,b.val
HAVING COUNT(*) = 1;
如何使名字和地址独特?
请注意,这是CSV格式:
预期结果:
"Abbas","Dippelstr. 736, 23892 Hainichen",
"Gilda","Noackallee 6/2, 24711 Malchin",
"Guenter","Fredy-Junk-Gasse 3, 90438 Schmölln",
"Hans-Ulrich","Karl-Peter-Kambs-Gasse 996, 15869 Sulzbach-Rosenberg"
示例数据:
Column1 (contains oids (which is irrelevant) and vals(relevant! hence listed below):
"Abbas", "Ante","Anthony"...
Column2: (see above - oids and vals)
"Benthinstraße 31, 35994 Kleve", "Cordula-Bachmann-Ring 4/8, 06292 Neustrelitz", "Danny-Fischer-Weg 8/9, 28346 Rastatt", "Eckbauergasse 157, 10570 Rudolstadt"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
交叉加入
是错误的方法。从 comma(,
)分开的多个是交叉加入的。请参阅:
anon.first_name中的10个名称? /code>和10个地址
anon.address
表格> 100行,每个名称和地址乘以交叉加入表的基数。正是你不想要的。所有与重复的混淆都是由此造成的。在之后应用不同的
交叉连接效率极低。要仅使用每个名称和地址一次,请将一个随机(或任意)编号附加到每一侧,然后加入。
假设(由于缺乏声明)每个源表都有不同的条目。 (否则,您需要带有
不同的
的子查询才能首先崩溃 - 窗口函数,例如row_number()
在distient
之前应用在相同的查询级别上。)组合是任意,而不是随机。请参阅:
结果具有较小表的基数。跳过较大桌子的多余行被跳过。
为了允许一些重复,您只需在一个(或两个)侧引入一些重复的数字(
nuterary_nr
)具有相同的任意号码。为您的示例:这第二次增加了所有名字的约10%。 (因此有些名称获得了两个地址。)这次,名称按字母顺序排列。随机示例以0(
-1
)开头,数字只能增加速度较慢,因此相同的名称永远无法获得相同的nutyary_nr
,并且永远不会合并相同的地址两次同名。只要至少一侧获得任意(或随机)数字,结果仍然是任意(或随机)。
有许多的方式。很大程度上取决于您的确切输入以及结果的确切要求。
A
CROSS JOIN
is the wrong approach to begin with. MultipleFROM
items separated by comma (,
) are cross-joined. See:10 names in
anon.first_name
and 10 addresses inanon.address
form a Cartesian product of 100 rows, with each name and address multiplied by the cardinality of the cross-joined table. Exactly what you don't want. All the confusion with duplicates was created by that. ApplyingDISTINCT
after the cross join is hugely inefficient.To use each name and address only once, attach a random (or arbitrary) number to each side and join on that.
Assuming (for lack of declaration) that each source table has distinct entries. (Else you need subqueries with
DISTINCT
to collapse dupes first - window functions likerow_number()
are applied beforeDISTINCT
on the same query level.)Combinations are arbitrary, rather than random. See:
The result has the cardinality of the smaller table. Excess rows from the bigger table are skipped.
To allow some duplicates, you just introduce some duplicate numbers (
arbitrary_nr
) on one (or both) side(s), either with the same row multiple times, or different rows with the same arbitrary number. For your example:This adds ~ 10 % of all names a second time. (So some names get two addresses.) This time, names are ordered alphabetically. The random sample starts with 0 (
- 1
) and numbers can can only increase slower, so the same name can never get the samearbitrary_nr
, and the same address is never combined with the same name twice.The result is still arbitrary (or random) as long as at least one side gets arbitrary (or random) numbers.
There are many ways. Much depends on your exact input, and exact requirements for the result.
您正在使用
在此处选择“不同的
错误”。为了达到预期的结果,您需要像这样修改查询:
这样您的结果行将是唯一的。
You are using
select distinct
wrong here.To achieve the expected result you need to modify your query like so:
That way your result rows will be unique.