单个句子由什么组成?怎么命名呢?

发布于 2024-08-19 06:37:19 字数 252 浏览 7 评论 0原文

我正在设计文本解析器的架构。例句:这里是内容,这里是内容。

整个句子是一个...句子,这是显而易见的。 Thequick 等都是单词; . 是标点符号。但一般来说,单词和标点符号是什么?它们只是符号吗?我根本不知道如何以最合理的抽象方式命名单个句子的组成部分(因为人们可能会写它由字母/元音等组成)。

感谢您的帮助:)

I'm designing architecture of a text parser. Example sentence: Content here, content here.

Whole sentence is a... sentence, that's obvious. The, quick etc are words; , and . are punctuation marks. But what are words and punctuation marks all together in general? Are they just symbols? I simply don't know how to name what a single sentence consists of in the most reasonable abstract way (because one may write it consists of letters/vowels etc).

Thanks for any help :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初心未许 2024-08-26 06:37:19

从技术上讲,您所做的是词法分析(“lexing”),它采用一系列输入符号并生成一系列标记或词位。所以单词、标点符号和空格都是标记。

在 (E)BNF 术语中,词位或标记与“终结符”同义。如果您将解析规则集视为一棵树,那么终端符号就是树的叶子。

那么你的输入的原子是什么?是一个词还是一个句子?如果它是单词(和空格),那么句子更类似于解析规则。事实上,“句子”一词本身就可能具有误导性。将整个输入序列称为句子的情况并不罕见。

非空白字符序列的半通用术语是“textrun”。

What you're doing is technically lexical analysis ("lexing"), which takes a sequence of input symbols and generates a series of tokens or lexemes. So word, punctuation and white-space are all tokens.

In (E)BNF terms, lexemes or tokens are synonymous with "terminal symbols". If you think of the set of parsing rules as a tree the terminal symbols are the leaves of the tree.

So what's the atom of your input? Is it a word or a sentence? If it's words (and white-space) then a sentence is more akin to a parsing rule. In fact the term "sentence" can itself be misleading. It's not uncommon to refer to the entire input sequence as a sentence.

A semi-common term for a sequence of non-white-space characters is a "textrun".

梦开始←不甜 2024-08-26 06:37:19

包含两个子类别“单词”和“标点符号”的常见术语是“标记”,在讨论解析时经常使用。

A common term comprising the two sub-categories "words" and "punctuation", often used when talking about parsing, is "tokens".

污味仙女 2024-08-26 06:37:19

根据您正在查看的输入文本的词法分析的阶段,这些可以是“词位”或“标记”。

Depending on what stage of your lexical analysis of input text you are looking at, these would be either "lexemes" or "tokens."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文