在 Vertica 中,您是否应该将字符串分解到它们自己的逻辑表中?
假设你的逻辑表是:
CREATE TABLE employee(
name VARCHAR,
university VARCHAR
);
现在你只有几所大学。因此,您可以提取出大学名称:
CREATE TABLE employee(
name VARCHAR,
university integer references university(university)
);
CREATE TABLE university(
university identity,
name varchar
);
您有这样的查询:
SELECT employee
FROM employee as e1
WHERE EXISTS
(SELECT employee
FROM employee as e2
WHERE e1.name = e2.name AND e1.university <> e2.university)
我想知道的是:第二个逻辑模式(名称被“提取出”)是否可以加快速度?也许是因为那里,e1.university <> e2.university
是整数而不是字符串的比较。
Suppose your logical table is:
CREATE TABLE employee(
name VARCHAR,
university VARCHAR
);
Now you have only a few universities. Therefore, you could factor out the university name:
CREATE TABLE employee(
name VARCHAR,
university integer references university(university)
);
CREATE TABLE university(
university identity,
name varchar
);
You have queries of the sort:
SELECT employee
FROM employee as e1
WHERE EXISTS
(SELECT employee
FROM employee as e2
WHERE e1.name = e2.name AND e1.university <> e2.university)
What I'm wondering about is: does the second logical schema, where the name is "factored out", speed up things? Perhaps because there, e1.university <> e2.university
is a comparison of integers rather than of strings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我知道这是一个老问题了——但答案通常是否定的。产品执行的编码通常会充分缩小列,使其比必须发生的联接更快。
I know this is an old question -- but the answer is usually no. The encoding that the product does will generally shrink the column enough that it is faster than the join that would have to occur.