当前位置：文江博客话题详情

什么是有限状态传感器？

发布于 2024-10-15 09:48:24 字数 122 浏览 8 评论 0原文

有人可以告诉我什么是有限状态传感器吗？

我已阅读维基百科文章，但什么也不明白。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

格子衫的從容 2024-10-22 09:48:24

有限状态转换器（FST）是一种有限状态自动机（FSA、FA），它产生输出并读取输入，这意味着它对于解析很有用（而“裸”FSA 只能用于识别，即模式匹配）。

FST 由有限数量的状态组成，这些状态通过标有输入/输出对的转换链接起来。 FST 从指定的开始状态开始，根据输入跳转到不同的状态，同时根据其转换表生成输出。

FST 在 NLP 和语音识别中很有用，因为它们具有良好的代数属性，最值得注意的是它们可以在组合下自由组合（形成代数），从而实现了常规关系上的关系组合（将其视为非确定性函数组合），而保持非常紧凑。 FST 可以在线性时间内将常规语言解析为字符串。

举个例子，我曾经将形态解析作为一堆 FST 来实现。我的动词主要 FST 会将常规动词“walked”变成“walk+PAST”。我还有一个动词“to be”的 FST，它将“is”变成“be+PRESENT+3rd”（第三人称），其他不规则动词也类似。使用 FST 编译器将所有 FST 合并为一个 FST，该编译器生成的 FST 比各个部分的总和小得多，并且运行速度非常快。 FST 可以通过各种接受扩展正则表达式语法的工具来构建。

回复收藏 0 原文

七七 2024-10-22 09:48:24

有限状态传感器本质上是一个在两个（或更多）磁带上工作的有限状态自动机。考虑传感器的最常见方式是将其视为一种“翻译机”。他们从其中一盘磁带上读取内容，然后在另一盘磁带上写入。例如，这是一个将 as 转换为 bs 的转换器：
弧处的
a:b 意味着在此转换中传感器从第一个磁带读取 a 并将 b 写入第二个磁带。

参考：有限状态传感器

回复收藏 0 原文

往事随风而去 2024-10-22 09:48:24

用尽可能简单的术语来说，我理解 FST 本质上是一个“东西”，它根据输入磁带从一种状态移动到另一种状态，并写入不同的输出磁带。磁带本质上是一组输入，就像字符串中的字符一样。

整个 FST 由一组状态和它们之间的链接表示。当其输入条件正确时，链接被“激活”，然后给出调整后的磁带的下一个状态。

例如，假设 FST 从处于状态 1 的磁带 abc 开始。状态 2 的链接与 a 匹配，并将其更改为 b。这将被激活，将输出磁带设置为 b，并将剩余的 bc 传递到状态 2。如您所见，每个状态仅在存在链接到输入条件正确的状态，将剩余的输入传递到下一个状态，并写入单独的输出磁带。每个 FST 在磁带上运行一次并输出到另一个磁带一次。

为了更清楚地了解它们阅读并查看本文中的图表 (原始损坏的链接）。

回复收藏 0 原文

爱情眠于流年 2024-10-22 09:48:24

直接答案：有限状态传感器只是一个有限状态自动机 - 但同时具有输入和输出字。这最好用代数术语来理解。

字母表 X 上的语言 A 是从 X 中得出的所有有限单词的集合 X* 的子集；即 A ⊆ X*。具有输入字母 X 和输出字母 Y 的翻译 B 是 B ⊆ X* × Y* 的子集。这可以推广到允许三个或更多通道或层，例如 C ⊆ X* × Y* × Z*，但奇怪的是，这个概念显然在文献中没有看到；尽管拥有多层或通道的想法在应用程序中随处可见，例如在多代理通信协议中，或在音乐和视频制作中，以及在理论语言学中。

集合 X* 被赋予一个代数，以单词串联作为乘积，以空单词（此处表示为 1）作为恒等式。它满足两个公理：

    Associativity:  (uv)w = u(vw), for all u, v, w ∈ X*; 
    Identity:       w1 = w = 1w, for all w ∈ X*;

它们共同构成了幺半群的定义属性。集合 X* 是一个自由幺半群，因为除了来自公理的关系之外，没有其他关系可以维持它；所以 X* 是从集合 X 自由生成的幺半群。

可以通过将 (1,1) 作为恒等式并将乘积定义为 (m ,n)(m',n') = (mm',nn')，其中 m, m' ∈ M 且 n, n' ∈ N。

乘积 M × N 是 M 和 N 的元素相加的结果在一起并强加关系 mn = nm，对于 m ∈ M 和 n ∈ N：通过将每个 m ∈ M 双关为 (m,1) ε M × N，并将每个 n ∈ N 为 (1,n) ε M × N ;恒等式 (m,1)(1,n) = (m,n) = (1,n)(m,1) 是关系 mn = nm 本身的双关语。除此之外，这意味着 M × N（几乎）永远不是自由幺半群。

特别是，X* × Y* 等价地被描述为自由幺半群 (X ∪ Y)*，其本身服从恒等式 xy = yx，对于 x ∈ X 和 y ∈ Y ... 假设 X 和 Y 是非-重叠，X ∩ Y = ∅。

有限状态自动机可以在任何幺半群上定义。这同样适用于经典文献中描述的所有其他自动机家族：下推自动机、图灵机等。当自动机位于乘积幺半群上时，特别是 X* × 形式之一Y*，那么它是一个传感器。

各自的族是

The direct answer: a finite state transducer is just a finite state automaton - but one that has both input and output words. This is best understood in algebraic terms.

A language, A, over an alphabet X is a subset of the set X* of all finite words drawn from X; i.e. A ⊆ X*. A translation, B, with input alphabet X and output alphabet Y is a subset of B ⊆ X* × Y*. This can be generalized to allow for three or more channels or tiers, e.g. C ⊆ X* × Y* × Z*, though curiously, the concept is apparently not seen in the literature; even though the idea of having multiple tiers or channels appears all over the place in applications, like in multi-agent communication protocols, or in music and video production, as well as in theoretical linguistics.

The set X* is endowed with an algebra, with word concatenation as the product, and the empty word, denoted here as 1, as the identity. It satisfies the two axioms:

    Associativity:  (uv)w = u(vw), for all u, v, w ∈ X*; 
    Identity:       w1 = w = 1w, for all w ∈ X*;

Together, these comprise the defining properties of a monoid. The set X* is a free monoid, because no other relations hold over it, except those which come from the axioms; so X* is the monoid freely generated from the set X.

One can define the product M × N of any two monoids, M and N, as a monoid, by taking (1,1) as the identity and defining products as (m,n)(m',n') = (mm',nn'), where m, m' ∈ M and n, n' ∈ N.

The product M × N is the result of throwing the elements of M and N together and imposing the relations mn = nm, for m ∈ M and n ∈ N: by punning each m ∈ M as (m,1) ∈ M × N, and each n ∈ N as (1,n) ∈ M × N; with the identity (m,1)(1,n) = (m,n) = (1,n)(m,1) being a punning of the relation mn = nm, itself. Among other things, that means M × N is (almost) never a free monoid.

In particular, X* × Y* is equivalently described as the free monoid (X ∪ Y)*, itself, subject to the identities xy = yx, for x ∈ X and y ∈ Y ... provided that X and Y are non-overlapping, X ∩ Y = ∅.

Finite state automata can be defined over any monoid. This applies equally well to all other families of automata described in the classical literature: push-down automata, turing machines, etc. When the automaton is over a product monoid, particularly one of the form X* × Y*, then it's a transducer.

The respective families are ????X* for languages over X and ????(X* × Y*) for translations over inputs X and outputs Y; where the power set ????S of a set S is defined by the condition Z ∈ ????S ⇔ Z ⊆ S.

The power set ????M of a monoid M inherits the operations of M by taking the singleton set {1} ∈ ????M as the identity, and the defining product AB of sets A, B ∈ ????M as AB = { ab ∈ M: a ∈ A, b ∈ B }. This has the same effect of just throwing each m ∈ M into ????M as {m} ∈ ????M, and endowing the algebra with an ordering relation A ≥ B ⇔ A ⊇ B ⇔ B ⊆ A ⇔ B ≤ A, and a "sum" operator given as unions: ∑Y ≡ ⋃Y, for Y ⊆ ????M. This is the least upper bound operator, or supremum, with respect to the ordering relation "≥": ∑Y = ⋁Y. In particular, you have the regular expression operators:

    0 ≡ ⋁∅ = ∅
    A + B ≡ ⋁{A,B} = A ∪ B
    A* ≡ ⋁_{n≥0} Aⁿ = {1} ∪ A ∪ A² ∪ A³ ⋯

Arbitrary distributivity holds, which can be expressed equivalently in either of the following ways:

     A(⋁Y)B = ⋁{ACB ∈ ????M: C ∈ Y}, for all A, B ∈ ????M and Y ⊆ ????M
    (⋁Y)(⋁Z) = ⋁{AB ∈ ????M: A ∈ Y, B ∈ Z}, for Y, Z ⊆ ????M

An algebra that extends a monoid by including an ordering relation and a least upper bound operator over arbitrary subsets that satisfies distributivity is called a quantale with unit (here, the unit is {1}). The more general variety of quantales don't have to include a unit.

Both ????X* and ????(X* × Y*) are quantales, as is ????M for arbitrary monoids. The quantale ????X* of languages over X is freely generated from the set X, while (as before) the quantale ????(X* × Y*) of translations over X and Y can be considered to be generated from (X ∪ Y)*, subject to the identities xy = yx, for x ∈ X and y ∈ Y ... provided that X and Y are non-overlapping, X ∩ Y = ∅.

More generally, for any monoid M, the quantale ????M is the result of just taking M, itself, and attaching the ordering relation and least upper bound operator on it, so it is a free extension of M. It inherits the structure of M.

The family ℛM ⊆ ????M of rational subsets of M comprise the family containing the singleton subsets {m} ∈ ℛM, for each m ∈ M; finite least upper bounds (particularly: ∅ ∈ ℛM and A ∪ B ∈ ℛM for each A, B ∈ ℛM); products (AB ∈ ℛM, for A, B ∈ ℛM) and stars (A* ∈ ℛM for A ∈ ℛM).

The family of subsets of a monoid M that are recognized by a finite state automaton are also its rational subsets ℛM. The Kleene Theorem generalizes to arbitrary monoids.

The regular expressions over X are the elements of ℛX*, while the rational transductions over input alphabet X and output alphabet Y are elements of ℛ(X* × Y*).

The elements of ℛM are generally called "rational expressions" (rather than regular expressions), while the specialization to free monoids M = X* is where the term "regular expressions" is used.

The distributivity property gives rise to the following limited forms of distributivity:

    The Zero Property: A0B = 0, for A, B ∈ ????M
    (Additive) Distributivity: A(B+C)D = ABD + ACD, for A, B, C, D ∈ ????M
    *-Continuity: AC + ABC + AB²C + AB³C + ⋯ = A B* C, for A, B, C, ∈ ????M

which applies generally to ????M, but specifically also to the subfamily ℛM.

Since ℛM is, itself, a monoid, you can also consider automata that have rational expressions from ℛM on their arcs. Allowing regular expressions on the arcs is a common convention ... and it generalizes to arbitrary monoids. In that case, the corresponding family is ℛℛM.

As an algebra, the families ℛM are instances of what are known as *-continuous Kleene algebras: they are reduced forms of quantales in which the least upper bound operator (and distributivity property) are only postulated to hold over rational subsets, rather than over arbitrary subsets. That means: if Y ∈ ℛℛM then ⋁Y = ⋃Y ∈ ℛM, and ⋁{ACB: C ∈ Y} = A(⋁Y)B, for A, B ∈ ℛM and Y ∈ ℛℛM.

Larger classes of automata/transducers and language/translation families can be considered - and they can each be generalized to monoids. The same applies to grammars.

So, the family of context-free subsets of a monoid M may be denoted ????M; which specializes to ????X*, for the context-free languages over alphabet X, and the family ????(X* × Y*), whose corresponding class of grammars and automata are, respectively, called "simple syntax directed translations" and "push-down transducers". The algebra ????M is a reduced version of quantale that has least upper bounds over its own context-free subsets, ????????M, which is distributive over the range of subsets. They're are now known to be equivalently described as what are called μ-continuous Chomsky-algebras. They are the same as a Kleene algebra, except that the *-continuity property is generalized to:

    μ-continuity: ⋁{A 0 B, A f(0) B, A f(f(0)) B, ⋯} = A (μx f(x)) B

ranging over all Kleene-algebraic functions f(x); where μx f(x) denotes the least fixed point solution to x ≥ f(x). The Kleene-star, for instance, can be defined by, A* = μx (1 + Ax) or by A* = μx (1 + xA). So, the *-continuity property is a limited version of the μ-continuity property.

Structural relations between the different algebras are now known. In an algebraic formalism that inherits the monoid structure, one may be able to define the tensor product A ⊗ B of algebras A and B as the algebra that results from throwing A and B together and subjecting the result to the relations ab = ba for a ∈ A, b ∈ B.

For monoids, the "tensor product" is just the direct product, already described above. The other classes - the *-continuous Kleene algebra, the μ-continuous Chomsky algebra and unital quantales (the ones corresponding to ℛ, ????, ????, respectively) also have their varieties of "tensor product", which might be denoted ⊗ℛ, ⊗???? and ⊗????, respectively, to make the distinction clear.

The chief property is

    ℛ(M × N) ≅ ℛM ⊗ℛ ℛN,
    ????(M × N) ≅ ????M ⊗???? ????N,
    ????(M × N) ≅ ????M ⊗???? ????N,

for arbitrary monoids, M and N. So, the *-continuous Kleene algebra of rational transductions with input alphabet X and output alphabet Y can be expressed equivalently as the tensor product of regular expression algebras:

    ℛ(X* × Y*) ≅ ℛX* ⊗ℛ ℛY*.

It's now also known that the Chomsky-Schützenberger Theorem, in which Chomsky and Schützenberger very nearly created an algebraic foundation for the upward extension of the regular expressions to "context-free expressions", is a precursor to a result - only recently established - that shows that it actually does; namely showing, for arbitrary monoids M, that

    ????M ⊆ ℛM ⊗ℛ C₂

where the C₂ is the algebra containing {b,d,p,q} subjected to the identities

    bd = 1 = pq, bq = 0 = pd, db + qp = 1.

It can also be established with ????M ⊆ ℛM ⊗ℛ C₂', where C₂' is subjected only to the identities bd = 1 = pq, bq = 0 = pd and not to db + qp = 1. So, in effect, Chomsky co-authored a new formalism for context-free expressions in the early 1960's; without realizing it at the time (he knows it now). He's like the Faraday of formal language and automata theory.

So, among other things, push-down transductions can be expressed as rational transductions on finite state transducers, provided that you also allow elements of C₂' on the arcs. A push-down transducer, itself, can be directly expressed in that way.

A similar family ????M exists for the Turing-computeable subsets of a monoid M; and the concepts of "type 0 grammar", Turing machines & transducers all generalize to arbitrary monoids. The corresponding property ????(M × N) ≅ ????M ⊗???? ????N holds. So, Turing transducers can also be expressed as finite state transducers in which elements of C₂ ⊗ℛ C₂ are permitted on the arcs. It is also believed that the following analogue to the algebraic version of the Chomsky-Schützenberger Theorem holds: ????M ⊆ ℛM ⊗ℛ C₂ ⊗ℛ C₂ - which would give rise to a further upward extension of "context-free expressions" to "Turing expressions".

回复收藏 0 原文

~没有更多了~