有限状态机的速度 - OO 与过程式

发布于 2024-09-25 15:51:37 字数 251 浏览 6 评论 0原文

大家好,
我正在设计一个程序,该程序将从输入中接受一系列令牌并将它们提供给我设计的有限状态机。我设计了一个面向对象风格的测试有限状态机,具有机器本身的结构和转换等,但我正在编写的应用程序是速度非常重要的应用程序。

到目前为止,使用机器、添加新状态等,已被证明很容易,而且不是很复杂。它很容易理解,离开一个月再回来看代码也不会很迷失方向。然而,我不确定当前的面向对象方法的速度权衡是什么。对象的分配、数据的存储等是否会降低使用一堆标签和 goto 语句所获得的速度?

Hey all,
I am designing a program that will accept from input a series of tokens and feed them to a finite state machine which I have designed. I have designed a test finite state machine in object-oriented style, with structs for the machine itself and transitions, etc. but the application I am writing is one where speed is very important.

So far, using the machine, adding new states and the like, has proven to be easy and not very complicated. It is easy to comprehend and leaving for a month and coming back to the code will not be very disorienting. However, I am not sure what the speed trade off is with the current OO approach. Will the allocation of the objects, storage of data, etc. take away much of the speed that would be had by using a bunch of labels and goto statements?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

怀里藏娇 2024-10-02 15:51:37

不要从 OO 比函数式或过程式编程慢的角度考虑,而是从操作的角度考虑。调用函数、分支、获取、存储等......都需要一些时间,并且要了解每个解决方案的性能,您需要计算每个解决方案需要执行的工作量。

执行此操作的最佳方法是使用面向对象的测试解决方案并查看它是否足够快。如果没有,那就进行分析,找出哪些分支机构或商店花费最多,看看是否可以避免或简化它们。慢慢地重新构建您的解决方案,直到它满足您所追求的性能目标。在某些情况下,您可能必须采用更具功能性或程序性的风格。

最后,如果您编写了一个由数百个堆栈变量组成的函数并转到,那么您就错了。代码必须始终可读且可维护。

额外思考:您能否再花 5,000 美元购买更好的硬件,以避免 50,000 美元的开发成本优化?

Rather than thinking about it in terms of OO being slower than functional or procedural programming think about it in terms of operations. Calling a function, branching, fetching, storing etc... all take some time and to get an idea of the performance of each solution you'd need to tally up how much of each of these you'll need to do.

The best way to do this is to use your OO test solution and see if it's fast enough. If not then profile it, work out which branches or stores are costing you the most and see if you can avoid or streamline them. Slowly re-architect your solution until it meets the performance goals you're after. It may be that in some cases you'll have to adopt a more functional or procedural style.

Lastly if you do code up a single function that consists of hundreds of stack variables and goto you've done it wrong. Code must always be readable and maintainable.

Extra thought: Can you just spend another $5,000 on better hardware to avoid $50,000 in development costs optimising?

不忘初心 2024-10-02 15:51:37

我更喜欢结构化组织良好、可维护、可理解(即使离开一个月后,这也是相当好的质量)的代码,而不是“一堆标签和goto语句”。话虽这么说,我还会在代码上运行分析器来检测瓶颈在进行速度分析时。

I would prefer well structured organized, maintainable, comprehendable (even after leaving for a month, thats quite a quality) code instead of a "bunch of labels and goto statements." Being said that, I would also run a profiler on the code to detect the bottlenecks when doing a speed analysis.

你列表最软的妹 2024-10-02 15:51:37

简而言之,处理器在表查找上进行表查找的速度比分支更快。只要表易于处理并且足够小以适合缓存,您所描述的 OO 风格就会更快。 (不过,我肯定不会说这两种范式与这两种实现相关联。)

技术性更强一些:处理器具有 32-64 K 的 L1 缓存和几到几十兆字节 L2-L3。分支缓存最多可达几千字节。

Simply put, the processor is faster at doing table lookups upon table lookups than branches. What you describe as the OO style will be faster, as long as the table is tractable and small enough to be fit into cache. (I wouldn't certainly say that either paradigm is associated with either implementation, though.)

To be very moderately more technical: The processor has 32-64 K of L1 cache and a few to a couple dozen megabytes of L2-L3. The branch caches amount to a few kilobytes at most.

暮年 2024-10-02 15:51:37

状态机多久改变一次?如果它在一段时间内保持不变,我所做的就是用我喜欢的任何语言将其翻译成硬编码例程。

状态机的转移图从何而来?它来自正则表达式吗?您知道您可以将其直接转换为结构化代码,而无需首先构建弧集。事实上,一开始编写该代码可能会更容易。

然后我要么只制作应用程序的这一部分,要么动态编译并将其链接到 dll 中并动态加载它。后者只需几秒钟。

有限状态机非常简单,基本上它们的运行时间只是加载 I/O 缓冲区所需时间的一小部分。他们不应该做任何不必要的内存分配、数据结构构建、任何面向对象的胡言乱语。

请记住,表示为一组状态和弧元组的有限状态机是一个理论模型。它的存在,就像图灵机一样,是为了证明它的定理。就像图灵机一样,在代码上不一定是好的实现技术。


只是为了向您展示我的意思,请考虑十进制整数的正则表达式:

dd*

其中“d”表示数字。作为有限状态机(元组),它会是:

A d -> B
B d -> B

作为代码,它会是:

char c = getc();
if (DIGIT(c)){
  c = getc();
  while(DIGIT(c)){
    c = getc();
  }
}

看到这段代码与正则表达式的相似之处了吗?
不要将 c = getc() 视为“获取下一个字符 c”。
将其视为“接受当前字符 c”。
我希望您能看到代码实际上没有比这更快的了。

如果您真的从 arc-set 开始,您可以生成以下代码:

c = getc();

A:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

B:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

END:

这是意大利面条代码,但不比 arc-set 多,而且它将与结构化代码一样快。
(事实上​​,这或多或少就是编译器将结构化代码转换成的内容。)

How often does the state machine change? If it stays constant for a while, what I do is translate it into a hard-coded routine in whatever language I prefer.

Where does the transition diagram of the state machine come from? Does it come from a regular expression? You know you can translate that directly into structured code without first building an arc-set. In fact, it might just be easier to write that code to begin with.

Then I either just make that part of the app, or I dynamically compile and link it into a dll and dynamically load it. The latter takes just a couple seconds.

Finite state machines are so simple that they should basically run in a small fraction of the time it takes to load the I/O buffer. They shouldn't be doing any unnecessary memory allocation, data structure building, any of that OO hoo-haw.

Remember, a finite state machine represented as a set of states and arc tuples is a theoretical model. It exists, like Turing machines, in order to prove theorems about it. Just like a Turing machine, it is not necessarily a good implementation technique in code.


Just to show you what I mean, consider a regular expression for decimal integers:

dd*

where 'd' means digit. As a finite-state machine (tuples) it would be:

A d -> B
B d -> B

as code it would be:

char c = getc();
if (DIGIT(c)){
  c = getc();
  while(DIGIT(c)){
    c = getc();
  }
}

See the similarity of this code with the regular expression?
Don't think of c = getc() as "get the next character c".
Think of it as "accept the current character c".
I hope you can see the code can't really get much faster than this.

If you're really starting from an arc-set, you could generate this:

c = getc();

A:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

B:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

END:

which is spaghetti code, but not any more than the arc-set is, and it will be just as fast as the structured code.
(In fact, that's more or less what the compiler converts your structured code into.)

九公里浅绿 2024-10-02 15:51:37

我是一名应用程序性能工程师,以下是我对有限状态机性能的看法。请注意,这些只是意见,不是我测量的事实,但我认为这对某些人有用:

  • FSM 通常对硬件来说很困难。硬件总是执行以前从未执行过的代码(因为从一种状态转移到另一种状态需要不同的函数)。
  • 原则上,巨大的 switch 会比 OOP 方法更快,因为整个 switch 都在一个函数内部,所以至少存在一些空间局部性。如果您使用枚举来保存当前状态,请确保 FSM 中的相邻状态是枚举中的相邻数字,因为这会增加处理状态的函数位于指令缓存中的可能性。
  • 在 OOP 的情况下,将来自不同类的所有状态更改函数放入同一链接器部分(例如状态更改函数上的 __attribute ((section (".state-machine"))) 属性) - 如果程序很小,则不重要,重要如果程序很大

这些是一般建议。现在,根据您的具体情况,您还有一些建议:

  • 如果您有许多相同的有限机器,请考虑使用面向数据的设计方法来创建它们。这个想法是不是随机地处理机器,而是根据它们当前所处的状态链接

I work as an application performance engineer, and here is my opinion about performance of finite state machines. Please note that these are opinions, not facts that I measured, but I think it would be useful for some people:

  • FSM are generally difficult for the hardware. The hardware is always executing code it never executed before (because moving from one state to another requires a different function).
  • Huge switch will in principle be faster than OOP approach because the whole switch is inside one function, so at least there is some spatial locality. If you are using an enum to hold the current state, make sure that neighboring states in the FSM are neighboring numbers in the enum because this will increase the likelihood that the functions processing the states are in the instruction cache.
  • In case of OOP, put all state changing functions from different classes into the same linker section (e.g. __attribute ((section (".state-machine"))) attribute on state changing functions) - not important if the program is small, important if the program is large

These are general recommendations. Now depending on your specific case you have a few more recommendations:

  • If you have many same finite machines, consider using the data-oriented design approach to create them. The idea is to process machines not randomly, but according to the state they are currently in link
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文