如何使用antlr显示句子中的所有代词及其人称
根据 WayneH 的语法进行编辑
这是我的语法文件中的内容。
grammar pfinder;
options {
language = Java;
}
sentence
: ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
;
words
: WORDS {System.out.println($text);};
pronoun returns [String value]
: sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
| ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
| sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
| pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
| psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
| pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};
sfirst returns [String value] : ('i' | 'me' | 'my' | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] : ('he' | 'she' | 'it' | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] : ('we' | 'us' | 'our' | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] : ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};
WORDS : LETTER*;// {$channel=HIDDEN;};
SPACE : (' ')?;
fragment LETTER : ('a'..'z' | 'A'..'Z');
这里,我在 java 测试类上有什么,
import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;
public class test2 {
public static void main(String[] args) throws RecognitionException {
String s;
Scanner input = new Scanner(System.in);
System.out.println("Eter a Sentence: ");
s=input.nextLine().toLowerCase();
ANTLRStringStream in = new ANTLRStringStream(s);
pfinderLexer lexer = new pfinderLexer(in);
TokenStream tokenStream = new CommonTokenStream(lexer);
pfinderParser parser = new pfinderParser(tokenStream);
parser.pronoun();
}
}
我需要在测试文件中放入什么,以便它显示句子中的所有代词及其各自的值(s1,s2,...)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您尝试对口语/书面语言进行某种高级分析,您可能会考虑使用某种自然语言处理工具。例如, TagHelper Tools 将告诉您哪些元素是代词(和动词) 、名词、副词和其他深奥的语法结构)。 (THT 是我熟悉的唯一此类工具,因此不要将其视为对卓越的特定认可)。
In case you are trying to do some sort of high-level analysis of spoken/written language, you might consider using some sort of natural language processing tool. For example, TagHelper Tools will tell you which elements are pronouns (and verbs, and nouns, and adverbs, and other esoteric grammatical constructs). (THT is the only tool of that sort that I'm familiar with, so don't take that as a particular endorsement of awesomeness).
片段不会创建标记,将它们放入解析器规则中不会给出理想的结果。
在我的测试盒上,这产生了(我认为!)期望的结果:
在 Antlrworks 中,示例“我踢了你”返回了树结构:
program ->; [我,你]
。我觉得有必要指出,Antlr 从句子中剥离代词的做法有些过分了。考虑使用正则表达式。该语法不区分大小写。扩展 WORD 来消耗除代词词典(例如标点符号等)之外的所有内容可能有点乏味。需要对输入进行清理。
--- 编辑:响应第二个OP:
我更改了原始语法以方便解析。新语法是:
我将解释这些变化:
以下内容是用 C# 编写的(我不懂 Java),但我编写它的目的是让您能够阅读和理解它。
调用 ReadTokens("iickedyouandthem") 返回数组 ["i", "you", "them"]
fragments don't create tokens, and placing them in parser rules will not give desirable results.
On my test box, this produced (I think!) the desired result:
In Antlrworks, a sample "i kicked you" returned the tree structure:
program -> [i, you]
.I feel compelled to point out that Antlr is overkill for stripping the pronouns out of a sentence. Consider using a regular expression. This grammar is not case insensitive. Expanding WORD to consume everything except your dictionary of PRONOUNs (such as puncuation, etc) may be a bit tedious. Will require sanitization of input.
--- Edit: In response to the second OP:
I have altered the original grammar to make ease of parsing. The new grammar is:
I'll explain the changes:
The following is written in C# (I don't know Java) but I wrote it with the intent that you'll be able to read and understand it.
Calling ReadTokens("i kicked you and them") returns array ["i", "you", "them"]
我认为您需要了解有关 ANTLR 中的词法分析器规则的更多信息,词法分析器规则以大写字母开头,并为解析器将查看的流生成标记。词法分析器片段规则不会为流生成标记,但会帮助其他词法分析器规则生成标记,请查看词法分析器规则 WORDS 和 LETTER(LETTER 不是标记,但确实帮助 WORDS 创建标记)。
现在,当将文本文字放入解析器规则(规则名称将以小写字母开头)时,该文本文字也是词法分析器将识别和传递的有效标记(至少当您使用 ANTLR 时 - 我没有使用任何其他类似于 ANTLR 的工具来回答它们)。
我注意到的下一件事是你的“s”和“代词”规则似乎是同一件事。我注释掉了's'规则并将所有内容放入'代词'规则中
然后最后一件事是学习如何将动作放入语法中,你在's'规则中设置了返回值。我使代词规则返回一个字符串值,这样如果您想要“句子”规则中的操作,您将能够轻松完成“-i 代词”评论/答案。
现在,由于我不知道您的确切结果是什么,所以我与您的语法进行了一些细微的修改并重新组织(将我认为是解析器规则的内容移至顶部,并将所有词法分析器规则保留在底部)并添加了一些操作我想会告诉你你需要什么。另外,可能有几种不同的方法来完成此任务,我认为我的解决方案对于您可能想要的任何结果来说都不是完美的,但这是我能够在 ANTLRWorks 中使用的语法:
我认为最终结果是这样的语法将向您展示如何完成您正在尝试做的事情,并且无论最终结果是什么都需要修改。
祝你好运。
我认为你只需要在测试类中更改一行,
parser.代词();
到:
parser.sentence();
您可能还想更改语法中的其他一些内容:
空间 : ' ';
句子:(单词 | 代词)(空格(单词 | 代词))* ('.' | '?'); // 那么你可能想在句子和单词/代词之间放置一条规则。
I think you need to learn more about lexer rules within ANTLR, lexer rules start with uppercase letter and generate tokens for the stream the parser will look at. Lexer fragment rules will not generate a token for the stream but will help other lexer rules generate tokens, look at lexer rules WORDS and LETTER (LETTER is not a token but does help WORDS create a token).
Now, when a text literal is put into a parser rule (rule name will start with a lowercase letter) that text literal is also a valid token that the lexer will identify and pass (at least when you use ANTLR - I have not used any other tools similar to ANTLR to answer for them).
The next thing I was noticing is that your 's' and 'pronoun' rules appear to be the same thing. I commented out the 's' rule and put everything into the 'pronoun' rule
And then the last thing is to learn how to put actions into the grammer, you have some in the 's' rule setting the return value. I made the pronoun rule return a string value so that if you wanted the actions in your 'sentence' rule you would easily be able to accomplish your "-i pronoun" comment/answer.
Now since I do not know what your exact results are, I played with your grammer and made some slight modifications and reorganized (moving what I thought were parser rules to the top with keep all lexer rules at the bottom) and put in some actions that I think will show you what you need. Also, there could be several different ways to accomplish this and I don't think my solution is perfect for any of your possible wanted results, but here is a grammer I was able to get working in ANTLRWorks:
I think the end result is this grammer will show you how to accomplish what you are trying to do and will require modification no matter what that end result is.
Good luck.
I think you only have to change one line in your test class,
parser.pronoun();
to:
parser.sentence();
You might want to change a few other things in the grammer as well:
SPACE : ' ';
sentence: (words | pronoun) (SPACE (words | pronoun))* ('.' | '?'); // then you might want to put a rule between sentence and words/pronoun.