我使用 SQL Server。
有人给了我一些大桌子,没有任何限制,没有钥匙,什么也没有。
我知道有些列具有独特的值。对于给定的表是否有一种聪明的方法来查找具有唯一值的列?
现在,我通过计算是否有与表中的行一样多的 DISTINCT 值来手动为每一列执行此操作。
SELECT COUNT(DISTINCT col) FROM table
可能可以让光标在所有列上循环,但想听听是否有人知道更智能或内置的功能。
I use SQL Server.
I've been handed some large tables with no constrains on them, no keys no nothing.
I know some of the columns have unique values. Is there a smart way for a given table to find the cols that have unique values?
Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.
SELECT COUNT(DISTINCT col) FROM table
Could probably make a cursor to loop over all the columns but want to hear if someone knows a smarter or built-in function.
发布评论
评论(6)
这是一种基本上类似于 @JNK 的方法,但它不是打印计数,而是为每一列返回一个现成的答案,告诉您该列是否仅包含唯一值:
它只是比较
COUNT(DISTINCT column)< /code> 每列都带有
COUNT(*)
。结果将是一个只有一行的表,其中每列都将包含值UNIQUE
(对于那些没有重复项的列),如果存在重复项则为空字符串。但上述解决方案仅适用于那些没有 NULL 的列。应该注意的是,当您想在列上创建唯一约束/索引时,SQL Server 不会忽略 NULL。如果一列仅包含一个 NULL 并且所有其他值都是唯一的,您仍然可以在该列上创建唯一约束(但是不能将其设为主键,这需要值的唯一性和不存在 NULL)。
因此,您可能需要对内容进行更彻底的分析,可以使用以下脚本进行分析:
此解决方案通过检查三个值来考虑 NULL:
COUNT(DISTINCT column)
、COUNT(列)
和COUNT(*)
。它显示的结果与前一个解决方案类似,但对列的可能诊断更加多样化:UNIQUE
表示没有重复值且没有 NULL(可以是 PK 或具有唯一约束) /index);UNIQUE WITH SINGLE NULL
– 可以猜到,没有重复项,但有一个 NULL(不能是 PK,但可以有唯一的约束/索引);UNIQUE with NULL
– 没有重复项,两个或多个 NULL(如果您使用的是 SQL Server 2008,则可以仅对非 NULL 值使用条件唯一索引);空字符串 – 存在重复项,也可能为 NULL。
Here's an approach that is basically similar to @JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:
It simply compares
COUNT(DISTINCT column)
withCOUNT(*)
for every column. The result will be a table with a single row, where every column will contain the valueUNIQUE
for those columns that do not have duplicates, and empty string if duplicates are present.But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).
Therefore you might need a more thorough analysis of the contents, which you could get with the following script:
This solution takes NULLs into account by checking three values:
COUNT(DISTINCT column)
,COUNT(column)
andCOUNT(*)
. It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:UNIQUE
means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);UNIQUE WITH SINGLE NULL
– as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);UNIQUE with NULLs
– no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);empty string – there are duplicates, possibly NULLs too.
我认为这可能是最干净的方法。只需使用动态 sql 和单个 select 语句来创建一个查询,该查询将为您提供总行数和每个字段的不同值的计数。
在顶部填写数据库名称和表名。数据库名称部分非常重要,因为
OBJECT_NAME
仅适用于当前数据库上下文。Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.
Fill in the DB name and tablename at the top. The DB name part is really important since
OBJECT_NAME
only works in the current database context.如果您使用的是 2008,则可以使用 SSIS 中的数据分析任务来返回每个表的候选键。
此博客条目逐步介绍了整个过程,非常简单:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.
This blog entry steps through the process, it's fairly simple:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
我的代码做了几句话:
读取所有表和列
创建一个临时表来保存具有重复键的表/列
对于每个表/列,它运行一个查询。如果发现至少一个值的 count(*)>1
它会向临时表中插入内容
从系统表中选择与发现重复的表/列不匹配的列和值
A few words what my code does:
Read's all tables and columns
Creates a temp table to hold table/columns with duplicate keys
For each table/column it runs a query. If it finds a count(*)>1 for at least one value
it makes an insert into the temp table
Select's column and values from the system tables that do not match table/columns that are found to have duplicates
简单的一行代码怎么样:
如果创建了索引,那么您的column_name只有唯一值。如果您的column_name 中有重复内容,您将收到一条错误消息。
What about simple one line of code:
If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.
调整了@Andriy M的解决方案,这对我有帮助。
我们的表列名称包含以数字值开头的字母数字字符,因此某些表的查询失败。因此,在@Andriy M 的解决方案中在列名称周围添加了方括号。
修改后的查询如下所示,(实现了 @TT 使用 QUOTENAME 而不是方括号的建议)
Tweaked @Andriy M's solution which helped me.
Our table column names had got alphanumeric characters that started with numeric value and so the query failed for certain tables. So added square brackets around the column names to @Andriy M's solution.
The modified query goes as below, (Implemented @TT's suggestion of using QUOTENAME instead of square bracket)