用于分析文本的 TSQL 查询
我有一张表,其中包含订单号、取消日期和原因。 原因字段是 varchar(255) 字段,它是由许多不同的销售代表编写的,并且很难按原因类别进行分组,我需要生成一个报告来对取消原因进行分类。用TSQL分析原因最好的方法是什么?
销售代表输入的原因示例
cust already has this order going out
cust can not hold for item Called to cancel order
cust doesn't want to pay for shipping
wife ordered same item from different vendor, sent email
cst made a duplicate order, sent email
cst can't hold
Cust doesn't want to go through verification process so is cancelling order
doesn't ant to hold for Bo
doesn't want
Cust called to cancel the order He can no longer get the product he wants
cnt hld
will not comply with export req
cant' hold
Custs request
Cust will not hold for BO
per. cust. request.
顺便说一句,我有 SQL Server 2005。
I have a table that has ordernumber, cancelled date and reason.
Reason field is varchar(255) field and it was written by many different sales rep and really hard to group by the reason category I need to generate a report to categorize cancelation reasons. What is the best way to analyse the reasons with TSQL?
Sample of reasons entered by sales rep
cust already has this order going out
cust can not hold for item Called to cancel order
cust doesn't want to pay for shipping
wife ordered same item from different vendor, sent email
cst made a duplicate order, sent email
cst can't hold
Cust doesn't want to go through verification process so is cancelling order
doesn't ant to hold for Bo
doesn't want
Cust called to cancel the order He can no longer get the product he wants
cnt hld
will not comply with export req
cant' hold
Custs request
Cust will not hold for BO
per. cust. request.
BTW I have SQL Server 2005.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你的问题的一部分是这些不是真正的原因代码。对我来说,听起来像是你的架构有问题。如果没有预定义的原因代码可供参考,并且您允许为每个原因自由输入文本,那么除了拉回不同的原因之外,实际上没有办法直接执行此操作,这可能不会很有用。
只是一个想法,您是否可以在表中添加另一列,即使它处于临时或测试环境中,然后让业务用户能够分配代码(例如,1 表示错误发货,2 表示重复订单,3 表示错误)项目等)到每个订单取消。然后对此进行分析。
我想这就是他们对你的期望,但我不知道我是否看到了更好的方法。如果您有权威/知识,您总是可以自己进行分析,但如果您有大量取消,这可能会很痛苦。
编辑-我现在看到您已经用正则表达式标记了它...可以设置指定的关键字来提取条目,但是必须内置一些容差,并且之后仍然需要对不包含条目的项目进行手动分析由于拼写错误等原因,不属于任何指定的类别。/编辑
part of your problem is that this these aren't truly reason codes. sounds like an issue with your schema to me. if there aren't predefined reason codes to reference and you're allowing free text entry for each reason, then there's really no way to do this directly, outside of pulling distinct reasons back, which is probably not going to be very useful.
just an idea, can you add another column to the table, even if it's in a temp or test environment and then give the business users the ability to assign a code (e.g. 1 for mis-ships, 2 for duplicate orders, 3 for wrong item etc.) to each order cancellation. then perform the analysis on that.
i assume that's what they're expecting from you, but i don't know that i see any better way. you could always perform the analysis yourself if you have the authority/knowledge but this might be painful if you have a ton of cancellations.
edit- i see now that you've tagged this with regex... it would be possible to setup specified keywords to pull out the entries, but there'd have to be some tolerance built in and still manual analysis afterwards for items which don't fall into any specified category due to misspellings etc. /edit
+1@jmatthews,您确实需要选择原因代码,然后可能允许自由形式输入完整的原因。
如果这不是一个选项,您可以查看文本聚类。不过,不要指望这会很快或很容易,它仍然是一个开放的研究主题,并且与人工智能和机器学习相关。
+1 to @jmatthews, you really need to have reason codes that are selected and then possibly allow free-form entry for the full reason.
If this isn't an option you can look into text clustering. Don't expect that to be fast or easy though, it's still an open research topic and is related to both AI and machine learning.
查看 SSIS 中的术语查找,这里有一篇文章可供阅读。
Look at Term Lookup in SSIS, here is an article to read.