你如何给出(openFST 制作的)FST 输入?输出去哪儿了?

发布于 2025-01-07 17:41:01 字数 297 浏览 0 评论 0 原文

在开始之前,请注意我使用的是 linux shell(通过 Python 中的 using subprocess.call()),并且我使用的是 openFST。

我一直在筛选有关 openFST 的文档和问题,但我似乎找不到这个问题的答案:如何实际向 openFST 定义、编译和组合的 FST 提供输入?输出去哪儿了?我只需执行“fstproject”吗?如果是这样,我该如何给它一个要转换的字符串,并在达到最终状态时打印各种转换?

如果这个问题看起来很明显,我深表歉意。到目前为止我对 openFST 还不是很熟悉。

Before I start, note that I'm using the linux shell (via using subprocess.call() from Python), and I am using openFST.

I've been sifting through documents and questions about openFST, but I cannot seem to find an answer to this question: how does one actually give input to an openFST-defined, compiled and composed FST? Where does the output go? Do I simply execute 'fstproject'? If so, how would I, say, give it a string to transduce, and print the various transductions when the end-state(s) have been reached?

I apologize if this question seems obvious. I'm not very familiar with openFST as of yet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

相思故 2025-01-14 17:41:01

一种方法是创建执行转换的机器。
一个非常简单的例子是将字符串大写。

M.wfst

0 0 a A
0 0 b B
0 0 c C
0

随附的符号文件包含字母表中每个符号的一行。注 0 保留用于空 (epsilon) 转换,并且在许多操作中具有特殊含义。

M.syms

<epsilon> 0
a 1
b 2
c 3
A 4
B 5
C 6

然后编译机器

fstcompile --isymbols=M.syms --osymbols=M.syms M.wfst > M.ofst

对于输入字符串“abc”创建一个线性链自动机,这是一个从左到右的链,每个字符都有一个弧。这是一个接受器,所以我们只需要一列
输入符号。

I.wfst

0 1 a
1 2 b
2 3 c
3  

编译为接受器

fstcompile --isymbols=M.syms --acceptor I.wfst > I.ofst

然后组合机器并打印

fstcompose I.ofst M.ofst | fstprint --isymbols=M.syms --osymbols=M.syms 

这将给出输出

0   1   a   A
1   2   b   B
2   3   c   C
3

fstcompose 的输出是输入字符串的所有转换的格子。 (在本例中只有一个)。如果 M.ofst 更复杂,则可以使用 fstshortestpath 使用标志 --unique -nshortest=n 来提取 n 个字符串。该输出又是一个传感器,您可以废弃 fstprint 的输出,或者使用 C++ 代码和 OpenFst 库运行深度优先搜索来提取字符串。

插入 fstproject --project_output 会将输出转换为仅包含输出标签的接受器。

fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --osymbols=M.syms 

给出以下内容

0  1  A  A
1  2  B  B
2  3  C  C
3

这是一个接受器,因为输入和输出标签相同,--acceptor 选项可用于生成更简洁的输出。

 fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --acceptor

One way is to create your machine that performs the transformation.
A very simple example would be to upper case a string.

M.wfst

0 0 a A
0 0 b B
0 0 c C
0

The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.

M.syms

<epsilon> 0
a 1
b 2
c 3
A 4
B 5
C 6

Then compile the machine

fstcompile --isymbols=M.syms --osymbols=M.syms M.wfst > M.ofst

For an input string "abc" create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the
input symbols.

I.wfst

0 1 a
1 2 b
2 3 c
3  

Compile as an acceptor

fstcompile --isymbols=M.syms --acceptor I.wfst > I.ofst

Then compose the machines and print

fstcompose I.ofst M.ofst | fstprint --isymbols=M.syms --osymbols=M.syms 

This will give the output

0   1   a   A
1   2   b   B
2   3   c   C
3

The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags --unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.

Inserting fstproject --project_output will convert the output to an acceptor containing only the output labels.

fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --osymbols=M.syms 

Gives the following

0  1  A  A
1  2  B  B
2  3  C  C
3

This is an acceptor because the input and output labels are the same, the --acceptor options can be used to generate more succinct output.

 fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --acceptor
掩耳倾听 2025-01-14 17:41:01

保罗·迪克森的例子很好。由于OP使用Python,我想我应该添加一个关于如何使用打开 FST 的 Python 包装器。遗憾的是,您无法使用 Open FST 创建“线性链自动机”,但自动化很简单,如下所示:

def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(isymbols=automata_op.input_symbols().copy(), 
                            acceptor=keep_isymbols,
                            keep_isymbols=keep_isymbols, 
                            **kwargs)

    for i, el in enumerate(elements):
        print >> compiler, "{} {} {}".format(i, i+1, el)
    print >> compiler, str(i+1)

    return compiler.compile()

def apply_fst(elements, automata_op, is_project=True, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.

    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        is_project (bool, optional): whether to keep only the output labels.
        kwargs:
            Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project(project_output=True)
    return out

让我们定义一个简单的 Transducer,将字母“a”大写:

f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("A", 1)
f_ST.add_symbol("a", 2)
f_ST.add_symbol("b", 3)
compiler = fst.Compiler(isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True)

print >> compiler, "0 0 a A"
print >> compiler, "0 0 b b"
print >> compiler, "0"
caps_A = compiler.compile()
caps_A

输入图像描述这里

现在我们可以简单地使用传感器应用:

apply_fst(list("abab"), caps_A)

输出:
输入图片这里的描述

要了解如何将它用作接受器,请查看我的其他答案

The example from Paul Dixon is great. As the OP uses Python I thought I'd add a quick example on how you can "run" transducers with Open FST's Python wrapper. It's a shame that you can not create "linear chain automata" with Open FST, but it's simple to automate as seen below:

def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(isymbols=automata_op.input_symbols().copy(), 
                            acceptor=keep_isymbols,
                            keep_isymbols=keep_isymbols, 
                            **kwargs)

    for i, el in enumerate(elements):
        print >> compiler, "{} {} {}".format(i, i+1, el)
    print >> compiler, str(i+1)

    return compiler.compile()

def apply_fst(elements, automata_op, is_project=True, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.

    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        is_project (bool, optional): whether to keep only the output labels.
        kwargs:
            Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project(project_output=True)
    return out

Let's define a simple Transducer that uppercases the letter "a":

f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("A", 1)
f_ST.add_symbol("a", 2)
f_ST.add_symbol("b", 3)
compiler = fst.Compiler(isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True)

print >> compiler, "0 0 a A"
print >> compiler, "0 0 b b"
print >> compiler, "0"
caps_A = compiler.compile()
caps_A

enter image description here

Now we can simply apply the transducer using :

apply_fst(list("abab"), caps_A)

Output:
enter image description here

To see how to use it for an acceptor look at my other answer

小耗子 2025-01-14 17:41:01

更新 Yann Dubois 对 python3 的回答:

import pywrapfst as fst

print("")
f_ST: fst.SymbolTable


def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(
        isymbols=f_ST,  # There should be some way to get this from automata_op
        acceptor=keep_isymbols,
        keep_isymbols=keep_isymbols,
        **kwargs
    )
    for i, el in enumerate(elements):
        print("{} {} {}".format(i, i + 1, el), end="", file=compiler)
    print(str(i + 1), end="", file=compiler)
    lf = compiler.compile()
    return lf


def apply_fst(elements, automata_op, print_la=True, is_project=False, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.
    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        print_la (bool, optional): print linear automata as text representation
        is_project (str, optional): whether to keep only the "input" or "output" labels.
        kwargs: Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    if print_la:
        print("Linear Automata:\n", linear_automata)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project("output")
    return out


f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("A", 1)
f_ST.add_symbol("a", 2)
f_ST.add_symbol("b", 3)
compiler = fst.Compiler(
    isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True
)

print("0 0 a A", end="", file=compiler)
print("0 0 b b", end="", file=compiler)
print("0", end="", file=compiler)
caps_A = compiler.compile()
print("Uppercase Transducer with", caps_A.num_states(), "states:\n", caps_A)

caps_I = apply_fst(list("abab"), caps_A)
print("Output:\n", caps_I)

这会打印:

Uppercase Transducer with 1 states:
 0  0   a   A
0   0   b   b
0

Linear Automata:
 0  1   a   2
1   2   b   3
2   3   a   2
3   4   b   3
4

Output:
 0  1   a   A
1   2   b   b
2   3   a   A
3   4   b   b
4

Updating Yann Dubois answer to python3:

import pywrapfst as fst

print("")
f_ST: fst.SymbolTable


def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(
        isymbols=f_ST,  # There should be some way to get this from automata_op
        acceptor=keep_isymbols,
        keep_isymbols=keep_isymbols,
        **kwargs
    )
    for i, el in enumerate(elements):
        print("{} {} {}".format(i, i + 1, el), end="", file=compiler)
    print(str(i + 1), end="", file=compiler)
    lf = compiler.compile()
    return lf


def apply_fst(elements, automata_op, print_la=True, is_project=False, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.
    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        print_la (bool, optional): print linear automata as text representation
        is_project (str, optional): whether to keep only the "input" or "output" labels.
        kwargs: Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    if print_la:
        print("Linear Automata:\n", linear_automata)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project("output")
    return out


f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("A", 1)
f_ST.add_symbol("a", 2)
f_ST.add_symbol("b", 3)
compiler = fst.Compiler(
    isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True
)

print("0 0 a A", end="", file=compiler)
print("0 0 b b", end="", file=compiler)
print("0", end="", file=compiler)
caps_A = compiler.compile()
print("Uppercase Transducer with", caps_A.num_states(), "states:\n", caps_A)

caps_I = apply_fst(list("abab"), caps_A)
print("Output:\n", caps_I)

This prints:

Uppercase Transducer with 1 states:
 0  0   a   A
0   0   b   b
0

Linear Automata:
 0  1   a   2
1   2   b   3
2   3   a   2
3   4   b   3
4

Output:
 0  1   a   A
1   2   b   b
2   3   a   A
3   4   b   b
4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文