自动法律问答系统
我正在尝试实现一个 Web 应用程序,让用户定义规则并提出问题,以根据一组规则查看语句是否合法或非法。我想到的领域是小型社区或俱乐部的规则。
例如,假设一个可能的规则集包含以下规则:
Only cars with valid registration tags may park anywhere indefinitely.
Cars without valid registration tags may only park in a visitor spot for up to 3 days.
然后有人问“我可以把我的本田车停在这里吗?”
系统将尝试通过首先遵循类似于以下的问答树来回答:
"Is a Honda a car?"
=>Yes
"Does it have a valid registration tag?"
=>Yes
"Yes"
=>No
"Are you parking in a visitor spot?"
=>Yes
"Have you parked in that spot for more than 3 days?"
=>Yes
"No"
=>No
"Yes"
=>No
"No"
=>Define "visitor spot"?
"A visitor spot is a parking spot. A parking spot is spatial rectangular area of asphault with a width of 8 feet and a length of 15 feet with a variation of 1 foot. It has either another parking spot or a curb adjacent to it. It has the words "Visitor" painted on it. It ressembles <img>."
=>Define "parking"?
"Parking is the act of placing a vehicle within the spatial area of a parking spot. The state of a parked image ressembles <img>."
=>Define "valid registration tag"?
"A valid registration tag ressembles <img>"
=>No
"No"
=>Define "car"?
"A car is a 4 wheeled vehicle weighing less than 3 tons."
用户在每个节点选择一个答案,系统会根据答案提出下一个问题,直到到达叶节点,代表“最终”答案。
在每个节点,用户可以要求系统解释或定义问题中使用的术语。解释将是一系列包含术语的陈述,这些术语本身可以被进一步解释或定义。
当获得足够的经验后,系统可以自动跳过某些节点,例如第一个“本田是汽车吗?”当它得知在“停车”的上下文中“本田”始终意味着“汽车”时。
尽管此树中未显示,但某些树可能具有“未定义”叶节点,代表规则没有提供足够的覆盖范围来完全创建树的情况,需要将问题重定向到人类专家以进行澄清或更正规则。
目标是在数据库中定义规则,然后根据需要动态生成这些问答树。
尽管此处显示的规则和问题以自然语言表示,但初始系统将使用符号逻辑,因为除了逻辑解析之外还进行 NLP 将使初始系统变得非常复杂。这些规则最初可能会被起草为自然语言,但在输入系统之前,它们会被手动转换为离散规则。问题将显示为简单的自然语言陈述,答案将是多项选择。
这看起来像一个实用的项目吗?有现有技术吗?到目前为止,我还没有读过类似的内容,但我不确定哪些搜索关键字能够充分描述这个系统。
我应该使用什么工具?我不确定是否应该使用决策树或某种专家系统来将问题与规则相匹配并缩小问题的范围。
I'm trying to implement a web application that will let users define rules and ask questions to see if statements are legal or illegal according to a set of rules. The domains I have in mind would be rules for small communities or clubs.
For example, say a possible rule set contains the rules:
Only cars with valid registration tags may park anywhere indefinitely.
Cars without valid registration tags may only park in a visitor spot for up to 3 days.
And then someone asks "Can I park my Honda here?"
The system would attempt to answer by first following a question and answer tree resembling:
"Is a Honda a car?"
=>Yes
"Does it have a valid registration tag?"
=>Yes
"Yes"
=>No
"Are you parking in a visitor spot?"
=>Yes
"Have you parked in that spot for more than 3 days?"
=>Yes
"No"
=>No
"Yes"
=>No
"No"
=>Define "visitor spot"?
"A visitor spot is a parking spot. A parking spot is spatial rectangular area of asphault with a width of 8 feet and a length of 15 feet with a variation of 1 foot. It has either another parking spot or a curb adjacent to it. It has the words "Visitor" painted on it. It ressembles <img>."
=>Define "parking"?
"Parking is the act of placing a vehicle within the spatial area of a parking spot. The state of a parked image ressembles <img>."
=>Define "valid registration tag"?
"A valid registration tag ressembles <img>"
=>No
"No"
=>Define "car"?
"A car is a 4 wheeled vehicle weighing less than 3 tons."
The user selects an answer at each node, and the system would ask the next question according to an answer until a leaf node is reached, representing a "final" answer.
At each node, the user may ask the system to explain or define terms used in the question. Explanations would be a series of statements containing terms, which themselves could be further explained or defined.
After enough experience is gained, the system could automatically skip certain nodes, such as the first "Is a Honda a car?" when it learns that in the context of "parking" a "Honda" will always imply a "car".
Although not shown in this tree, some trees may have "Undefined" leaf nodes, representing cases where the rules didn't provide enough coverage to fully create the tree, requiring the question to be redirected to a human expert for clarification or correction of the rules.
The goal would be to define the rules in a database, and then dynamically generate these Q&A trees as needed.
Although the rules and questions shown here are represented as natural language, the initial system would use symbolic logic instead, as doing NLP in addition to this logical parsing would immensely complicate the initial system. The rules may be initially drafted as natural language, but they'd be manually converted to discrete rules by hand before being entered into the system. The questions would be displayed as simple natural language statements, and the answers would be multiple choice.
Does this seem like a practical project? Is there any prior art? I haven't read about anything like this so far, but I'm not sure what search keywords adequately describe this system.
What tools should I use? I'm not sure if I should use decision trees or some sort of expert system for matching questions to rules and narrowing down the scope of the question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要“决策树”软件来根据您的目标生成规则。如果您要开源,我建议使用“R”包及其 rpart 扩展。我还建议您使用文本分析包来开始对文档进行分类。 “R”还有 tm 扩展名,可以帮助解决这个问题。
这些是一些开源选项。商业软件也有很多不错的选择。
——拉尔夫·温特斯
You would need "decision tree" software to generate the rules based upon your target. If you are going open source I would suggest using the "R" package with it's rpart extension. I also suggest you use a text analytics package to begin to classify your documents. "R" also has the tm extension which can help with this.
These are some of the open source options. There are also many good options for commercial software.
-Ralph Winters
IBM 刚刚花了 3 年时间和数百人编写了一台超级计算机 可以在 Jeopardy 比赛中获胜。
在我看来,你提出的并不是一个实际的项目。
IBM 所做的以及您在较小规模上尝试的称为语义处理。
IBM just spent 3 years and hundreds of people programming a super-computer that could play and win at Jeopardy.
In my opinion, you're not proposing a practical project.
What IBM did and what you're trying on a smaller scale is called semantic processing.