如何选择填充最多的记录?
我很不幸不得不处理包含特定记录重复项的数据库,我正在寻找一种快速方法来表示“获取填充最多的记录并更新重复项以匹配它”。
从那里我可以选择不同的记录并获取一组有用的记录。
有什么想法吗?
主要是名称和地址(如果有帮助的话)...
好吧,这里提出了很多问题,所以我会添加更多内容:
首先,我想提取最“填充”而不是最“流行”的行,这意味着具有最多值的行不为空。
一旦我有了集合(这很容易,因为在我的情况下 id 匹配),我就可以填充其他行中的缺失值。
我不想破坏数据,我只想根据准确的匹配(例如通过id)更新数据。
我目前的问题是找出一组行中填充最多的字段,自从发布这个问题以来,我找到了一种不同的方法来解决我更大的问题,即发送到远程服务器的内容,但是我'我仍然有兴趣知道这个问题的解决方案是什么。
示例数据可能看起来像这样...
id name addr1 addr2 ect
1 fred 1 the street Some town ...
1 fred null null null
给定一个充满这样的匹配对的表,我想找到这些对,然后抓取其中包含信息的对,并将这些值插入到另一行中存在空值的位置。
I have the unfortunate luck of having to deal with a db that contains duplicates of particular records, I am looking for a quick way to say "get the most populated record and update the duplicates to match it".
From there I can select distinct records and get a useful set of records.
Any ideas?
It's mainly names and addresses if that helps...
Ok lots of questions asked here so i'll add little bit more:
Firstly I want to pull the most "populated" not most "popular", this means the row with the most values that are not null.
Once I have the set (which is easy because in my case the id's match) I can then populate the missing values in the other rows.
I don't want to destroy data and i only intend to update data based on an accurate match (eg by id).
My problem at the moment is figuring out which of a set of rows has the most populated fields, having said that since posting this question I have found a different way to solve my bigger problem which is what to send to a remote server however I'm still interested to know what the solution to this might be.
Sample data might look something like this ...
id name addr1 addr2 ect
1 fred 1 the street Some town ...
1 fred null null null
Given a table full of matching pairs like this I want to find the pairs then grab the one with the info in it and insert those values where there is a null in the other row.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请记住,您可能会破坏此处的数据。仅仅因为一行填充的列较少并不意味着它在填充的列中不太准确。
我假设重复项是由名为“名称”的列确定的。您需要根据重复项的定义进行调整。另外,由于您没有给出任何关于如何处理“人口最多”关系的规则,所以我只是选择了 id 最低的行。
编辑:我刚刚重读了你的问题,你提到,“从那里我可以选择不同的记录并获得一组有用的记录。”如果这就是您真正想要的,那么不必费心更新其他行,只需首先选择您想要的行并保持其他所有内容不变:
Keep in mind that you will be potentially destroying data here. Just because a row has fewer columns filled doesn't mean that it's less accurate in the columns that are filled.
I've assumed that duplicates are determined by a column called "name". You'll need to adjust based on your definition of duplicates. Also, since you didn't give any rules on how to deal with ties for "most populated" I just chose the row with the lowest id.
EDIT: I just reread your question and you mention, "From there I can select distinct records and get a useful set of records." If that's what you really want, then don't bother updating the other rows, just select the ones that you want in the first place and leave everything else intact: