与另一个表格中找到匹配的人(MS SQL Server)
我有两个表:
“ person”表
ID FirstName LastName
----------- ---------- ----------
1 Janez Novak
2 Matija Špacapan
3 Francka Joras
“用户列表”
ID FullName
----- --------------------
1 Andrej Novak
2 Novak Peter Janez
3 Jana Novak
4 Andrej Kosir
5 Jan Balon
6 Francka Joras
7 France Joras
结果 ,查询必须从两个表中返回这些ID,即表Person中的firstName和lastname来自表“ table用户列表中”。名称和姓氏必须完全相同。 表用户列表中的fullname可以包括中间名 - 应该“忽略”。
比赛:Janez Novak = Janez Novak或Novak Janez或Janez Peter Novak
不是比赛:Janez Novak<> Janeza Novak或Jjanez Novak
想要结果:
ID FirstName LastName ID WholeName
---- ---------- --------- ---- -------------------
1 Janez Novak 2 Novak Peter Janez
3 Francka Joras 6 Francka Joras
这是我的查询:
SELECT
A.ID
,A.FirstName
,A.LastName
,B.ID
,B.WholeName
FROM
dbo.UserList B
cross join dbo.Person A
WHERE
(
CHARINDEX('"'+A.FirstName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0
AND CHARINDEX('"'+A.LastName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0
)
当桌子中没有很多记录时,查询可以正常运行。
但是我的桌子有:“人” - > 400K和“用户列表” - > 14K记录。
我找到解决方案的方法是否可以,还是还有其他更有效的方法可以做到这一点? 谢谢。
br
I have two tables:
table "Person"
ID FirstName LastName
----------- ---------- ----------
1 Janez Novak
2 Matija Špacapan
3 Francka Joras
Table "UserList"
ID FullName
----- --------------------
1 Andrej Novak
2 Novak Peter Janez
3 Jana Novak
4 Andrej Kosir
5 Jan Balon
6 Francka Joras
7 France Joras
As a result, the query must return those IDs from both tables, that FirstName and Lastname from table Person exist in table UserList. The name and Lastname must be precisely the same.
FullName in table UserList can include the middle name - which should be "ignored".
Match: Janez Novak = Janez Novak OR Novak Janez OR Janez Peter Novak
Not a match: Janez Novak <> Janeza Novak OR Jjanez Novak
Wanted results:
ID FirstName LastName ID WholeName
---- ---------- --------- ---- -------------------
1 Janez Novak 2 Novak Peter Janez
3 Francka Joras 6 Francka Joras
This is my query:
SELECT
A.ID
,A.FirstName
,A.LastName
,B.ID
,B.WholeName
FROM
dbo.UserList B
cross join dbo.Person A
WHERE
(
CHARINDEX('"'+A.FirstName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0
AND CHARINDEX('"'+A.LastName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0
)
The query works OK when there are not many records in the tables.
But my tables have: "Person" -> 400k and "UserList" -> 14k records.
Is my approach to finding a solution OK, or is there any other more efficient way to do that?
Thank you.
BR
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
创建表
示例数据
查询(匹配名称)
输出
输出新样本数据(用于通配符测试)

跑步示例 sql fiddle> sql fiddle
上面的示例链接链接是mysql&amp;该代码在SQL Server上正常工作
Create tables
Sample data
Query (matching names)
Output
EDIT: output with new sample data (for wildcard testing)

Running example SQL Fiddle
Above example link is of MySQL & the code is working fine on SQL server
您的架构被打破了:P
有各种启发性进行匹配,但我希望您能找到反例破坏您尝试的任何方法。例如,这四个人呢:彼得·史密斯,皮特·史密斯,彼得·史密森和皮特·史密斯?
这是一个
%喜欢%
方法,我希望它会很慢。这是基于空间字符是定界符的假设的字符串操纵方法。
可能也很慢。也许您可以通过添加
substring(@fullname,0,CharIndex('',@fullname))
和作为计算列并索引它们。
Your schema is broken :p
There are various heuristis for doing the matching, but I expect you'll be able to find counterexamples to break whatever you try. For example what about the four people: Peter Smith, Pete Smith, Peter Smithson, and Pete Smithson?
Here's a
%LIKE%
approach, which I'd expect to be slow.Here's a string manipulation approach based on the assumption that the space character is the delimiter.
Probably also quite slow. Maybe you could speed it up by adding
SUBSTRING(@FullName, 0, CHARINDEX(' ', @Fullname))
andSUBSTRING(@FullName, LEN(@FullName) - CHARINDEX(' ', REVERSE(@Fullname))+1, CHARINDEX(' ', REVERSE(@Fullname)))
as computed columns and indexing them.
您可以尝试尝试的一种方法是将全名分为行,然后进行比较,仅选择名字和姓氏匹配的方法:
<
a href =“ https://i.sstatic.net/bhijo.png” =“ nofollow noreferrer”>
One method you could try is to split the full names into rows and then compare, selecting only those where both first and last name match:
Output: