我有一个(相当脏的)数据源(excel),如下所示:
ID |名称 |主题 |等级
123 | 123史密斯,乔·R。数学 | 2.0
123 | 123史密斯,乔·罗德里格斯 |法语 | 3.0
234 | 234美国能源部,玛丽·简·D.|生物学 | 2.5
234 | 234美国能源部,玛丽·简·道森|化学| 2.5
234 | 234美国能源部,玛丽·简|法语 | 3.5
我的应用程序的输出应该如下所示:
Smith, Joe R.
123
MATH | 2.0
FRENCH | 3.0
所以基本上我想做查询(仅针对 ID/Person 父“容器”),例如:
SELECT DISTINCT ID, Name FROM MyTable<br/>
or
SELECT ID, Name FROM MyTable GROUP BY ID
当然以上两种都是无效的,行不通。
我想“合并”相同的 ID,并忽略/截断具有相同 ID/不同名称的其他记录(因为我们都知道他们是同一个人,因为 ID 是我们的标识符,显然这只是一个拼写错误/脏数据) .
这可以通过单个 SELECT 查询来完成吗?
I have a (rather dirty) datasource (excel) that looks like this:
ID | Name | Subject | Grade
123 | Smith, Joe R. | MATH | 2.0
123 | Smith, Joe Rodriguez | FRENCH | 3.0
234 | Doe, Mary Jane D.| BIOLOGY | 2.5
234 | Doe, Mary Jane Dawson| CHEMISTRY | 2.5
234 | Doe, Mary Jane | FRENCH | 3.5
My application's output should look like this:
Smith, Joe R.
123
MATH | 2.0
FRENCH | 3.0
So basically I want to do query (just for the ID/Person parent 'container') something like:
SELECT DISTINCT ID, Name FROM MyTable<br/>
or
SELECT ID, Name FROM MyTable GROUP BY ID
Of course both of the above are invalid and won't work.
I would like to 'combine' the same ID's and ignore/truncate the other records with the same ID/different Name (because we all know they're the same person since ID is our identifier and clearly it's just a typo/dirty data).
Can this be done by a single SELECT query?
发布评论
评论(4)
如果您不太关心名称字段中显示哪个值,请使用 <代码>MAX()或
MIN()
:这是一个可以使用的工作示例:https://data.stackexchange.com/stackoverflow/q/116699/
If you don't really care which value shows up in the name field, use
MAX()
orMIN()
:Here's a working example to play with: https://data.stackexchange.com/stackoverflow/q/116699/
您可以找到名称的最小值或最大值
You can find the MIN or MAX Value of Name
会给你类似的东西
Will give you something like
如果您不关心保留哪个名称,则可以使用
MAX()
或MIN()
聚合来仅选择一个名称:If you don't care which name you keep, you can use a
MAX()
orMIN()
aggregate to pick just one name: