在开始之前,请注意我使用的是 linux shell(通过 Python 中的 using subprocess.call()
),并且我使用的是 openFST。
我一直在筛选有关 openFST 的文档和问题,但我似乎找不到这个问题的答案:如何实际向 openFST 定义、编译和组合的 FST 提供输入?输出去哪儿了?我只需执行“fstproject”吗?如果是这样,我该如何给它一个要转换的字符串,并在达到最终状态时打印各种转换?
如果这个问题看起来很明显,我深表歉意。到目前为止我对 openFST 还不是很熟悉。
Before I start, note that I'm using the linux shell (via using subprocess.call()
from Python), and I am using openFST.
I've been sifting through documents and questions about openFST, but I cannot seem to find an answer to this question: how does one actually give input to an openFST-defined, compiled and composed FST? Where does the output go? Do I simply execute 'fstproject'? If so, how would I, say, give it a string to transduce, and print the various transductions when the end-state(s) have been reached?
I apologize if this question seems obvious. I'm not very familiar with openFST as of yet.
发布评论
评论(3)
一种方法是创建执行转换的机器。
一个非常简单的例子是将字符串大写。
M.wfst
随附的符号文件包含字母表中每个符号的一行。注 0 保留用于空 (epsilon) 转换,并且在许多操作中具有特殊含义。
M.syms
然后编译机器
对于输入字符串“abc”创建一个线性链自动机,这是一个从左到右的链,每个字符都有一个弧。这是一个接受器,所以我们只需要一列
输入符号。
I.wfst
编译为接受器
然后组合机器并打印
这将给出输出
fstcompose 的输出是输入字符串的所有转换的格子。 (在本例中只有一个)。如果 M.ofst 更复杂,则可以使用 fstshortestpath 使用标志 --unique -nshortest=n 来提取 n 个字符串。该输出又是一个传感器,您可以废弃 fstprint 的输出,或者使用 C++ 代码和 OpenFst 库运行深度优先搜索来提取字符串。
插入 fstproject --project_output 会将输出转换为仅包含输出标签的接受器。
给出以下内容
这是一个接受器,因为输入和输出标签相同,--acceptor 选项可用于生成更简洁的输出。
One way is to create your machine that performs the transformation.
A very simple example would be to upper case a string.
M.wfst
The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.
M.syms
Then compile the machine
For an input string "abc" create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the
input symbols.
I.wfst
Compile as an acceptor
Then compose the machines and print
This will give the output
The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags --unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.
Inserting fstproject --project_output will convert the output to an acceptor containing only the output labels.
Gives the following
This is an acceptor because the input and output labels are the same, the --acceptor options can be used to generate more succinct output.
保罗·迪克森的例子很好。由于OP使用Python,我想我应该添加一个关于如何使用打开 FST 的 Python 包装器。遗憾的是,您无法使用 Open FST 创建“线性链自动机”,但自动化很简单,如下所示:
让我们定义一个简单的 Transducer,将字母“a”大写:
现在我们可以简单地使用传感器应用:
输出:
要了解如何将它用作接受器,请查看我的其他答案
The example from Paul Dixon is great. As the OP uses Python I thought I'd add a quick example on how you can "run" transducers with Open FST's Python wrapper. It's a shame that you can not create "linear chain automata" with Open FST, but it's simple to automate as seen below:
Let's define a simple Transducer that uppercases the letter "a":
Now we can simply apply the transducer using :
Output:
To see how to use it for an acceptor look at my other answer
更新 Yann Dubois 对 python3 的回答:
这会打印:
Updating Yann Dubois answer to python3:
This prints: