在 Java 中对源代码进行标记
对于系统软件开发课程,我正在为讲师发明的汇编语言开发一个完整的汇编程序。目前我正在研究标记器。在进行一些搜索时,我遇到了 Java StringTokenizer
类...但我发现它基本上已被弃用。然而,它似乎比带有正则表达式的 String.split 方法更容易使用。
我有什么理由应该避免使用它吗?典型的 Java 库中是否还有其他我不知道的东西可以很好地适合这项任务?
编辑:提供更多细节。
我认为 String.split 复杂的原因是我对正则表达式的了解大致就是我所了解的。虽然了解它们对我作为软件开发人员的常识很有帮助,但我不确定我现在是否愿意投入时间,特别是如果有更简单的替代方案的话。
就我对标记生成器的使用而言:它将遍历包含汇编代码的文本文件并将其分解为标记,将文本和标记类型传递给解析器。分隔符包括空格(空格、制表符、换行符)、注释开始字符“|” (可以出现在自己的行上,也可以出现在其他文本之后),以及用于分隔指令中的操作数的逗号。
我会用更数学的方式来写,但我对形式语言的了解有点生疏。
编辑2:更清楚地提出问题
我已经看到了 StringTokenizer 类的文档。它很适合我的目的,但不鼓励使用它。除了 String.split 之外,标准 java 库中是否还有其他有用的东西?
For a systems software development course, I'm working on a complete assembler for an instructor-invented assembly language. Currently I'm working on the tokenizer. While doing some searching, I've come across the Java StringTokenizer
class...but I see that it has been essentially deprecated. It seems far easier to use, however, than the String.split
method with regular expressions.
Is there some reason that I should avoid using it? Is there perhaps something else within the typical Java libraries that would suit this task well that I am not aware of?
EDIT: Giving more detail.
The reason I am considering String.split
complicated is that my knowledge of regular expressions is roughly that I know of them. While it would be helpful for my general knowledge as a software developer to know them, I'm not sure that I want to invest the time right now, especially if there is an easier alternative present.
In terms of my usage of the tokenizer: it will go through a text file containing assembly code and break it into tokens, passing the text and token type to a parser. Delimiters include white space (spaces, tabs, newlines), the comment-start character '|' (which can occur on its own line, or after other text), and the comma to separate operands in an instruction.
I would write that more mathematically, but my knowledge of formal languages is a bit rusty.
EDIT 2: Asking question more clearly
I have seen the documentation on the StringTokenizer class. It would have suited my purposes well, but its use is discouraged. Other than String.split
, is there something within the standard java libraries that would be helpful?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我相信 java.util.Scanner 类已经取代了 StringTokenizer。 Scanner 让您一次处理一个标记,而 String.split() 将分割整个字符串(如果您正在解析源代码文件,该字符串可能会很大)。使用扫描仪,您可以检查每个令牌,决定采取什么操作,然后丢弃该令牌。
I believe that the java.util.Scanner class has replaced StringTokenizer. Scanner let's you handle tokens one at a time, whereas String.split() will split the entire string (which could be large, if you're parsing a source code file). Using Scanner, you can examine each token, decide what action to take, then discard that token.
如果您正在构建的是汇编器,我将使用 JavaCC 来构建解析器/编译器。
If what you're building is an assembler, I would use JavaCC for building the parser/compiler.
从文档中可以看出:
StringTokenizer 是一个遗留类,出于兼容性原因而保留,尽管在新代码中不鼓励使用它。建议任何寻求此功能的人使用 String 的 split 方法或 java.util.regex 包。
以下示例说明了如何使用 String.split 方法将字符串分解为其基本标记:
打印以下输出:
From the documentation:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
The following example illustrates how the String.split method can be used to break up a string into its basic tokens:
prints the following output:
不要害怕正则表达式,给自己一个正则表达式编辑器,例如下面的 eclipse 插件,
http://brosinski.com/regex/update 您将能够测试表达式而无需编译甚至在编写程序之前。
如果您需要更多参考,这里有一些非常有用的网站:
虽然我认为上面使用 JavaCC 的建议听起来是正确的方法。
另一种选择是 ANTLR。
这是一篇比较 ANTLR 与 JavaCC 体验的文章。
Don't fear the regex, get yourself a regex editor such as the following eclipse plugin,
http://brosinski.com/regex/update and you'll be able to test the expressions without compiling or even before writing your program.
If you need more reference, here are some very useful sites :
Although I think the suggestion above of using JavaCC sound like the right approach.
Another option would be ANTLR.
Heres a post comparing the experience of ANTLR vs JavaCC.
当有更好的替代方案时,某些方法就会被弃用,或者这些方法在某些情况下是危险的。所以答案是 - 是的,你可以使用它,但是有更好的方法来实现你所需要的。
顺便说一句,分裂有什么复杂的?
Something is deprecated when there is a better alternative, or those methods are dangerous in some situations. So the answer is - Yep, you can use it, but there is a better way to achieve what you need.
Btw, what is complicate about split?