当前位置：文江博客话题详情

用Java设计高性能状态机

发布于 2024-10-21 14:05:46 字数 833 浏览 4 评论 0原文

我正在开始编写一个 Java 库来实现高性能有限状态机。知道有很多库，但我想从头开始编写自己的库，因为几乎所有库都构造了针对一次仅处理一个进行优化的自动机。

我了解 SO 社区中涉足状态机设计的人们在实现此类高性能库时认为最重要/最好的设计原则是什么。

注意事项

生成的自动机通常不会很大。（~ 100-500 个状态）。
不过，实施应该能够扩展。
实施应该能够实现快速转换（最小化、确定性等）。
希望实现 DFA、NFA、GNFA、PDA 以及可能的树自动机。如果可能的话，希望在单一界面下。
应该在内存使用和性能之间取得良好的平衡。

目前对我来说有关设计的问题是：

应该为 State、Symbol 和 Transition 类被定义？或者应该使用“隐藏”的内部结构。就我个人而言，我认为使用类会浪费大量内存，因为相同的信息可以以更简洁的形式存储。但是，这是否能够实现更快的转型？它还有其他优点/缺点吗？
内部存储数据的最佳方式是什么？使用 HashMap 和 HashSet 等数据结构可以实现分摊常量时间查找，但会涉及一定的开销。这是最好的方法吗？将转换信息存储为原始（或非）数组似乎浪费了大量内存。特别是当图书馆需要一次处理大量自动机时。不同数据结构的优缺点是什么？

我很感激任何意见。谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城°AllureLove 2024-10-28 14:05:46

那么你希望它有多快？ brics.dk/automaton 中的代码确实声明了自己的 State 和 Transition 类，但显然，这些可以使用原语重写（哎呀，整个Transition类的状态显然很容易适合long）。

问题是，例如，如果您将 Transition 类移动为简单的原语，那么您就不必再使用缓慢的 HashMap默认 Java 集合：您可以使用 Trove 的 TLongObjectHashMap（或 TLongInt... 或 TLongLong 等）等库，这些库拥有默认的HashMap大时代（Trove库基本上提供了超级高效的映射和集合，既快又小，当您使用基元时：您不会生成无数的垃圾，也不会生成常量不必要地围绕原语，因此减少 GC 等。如果您注重性能，那么您确实想检查 Trove...并且他们即将发布的 3.0 版本比 Trove 2.0 快 20%）。

但它真的有用吗？显然该库已经足够快了。毫无疑问，通过不浪费地创建对象并使用实际上性能良好的集合，可以使速度更快，但尚不清楚这是否是可取的。

除此之外，我很确定上面的库不是线程安全的。 State 构造函数通过这样做创建一个唯一的 ID：

static int next_id;
.
.
.
id = next_id++;

并且该构造函数是从... 90 个不同的地方调用的！

在多线程场景中不创建唯一ID的教科书示例（哎呀，即使使next_id易失性也不够，例如，您需要一个 AtomicInteger ）。我不太了解这个库，但这个 ID 的东西对我来说看起来非常可疑。

Well how fast do you want it to be? The code at brics.dk/automaton does declare its own State and Transition classes altough, obviously, these could be rewritten using primitives (heck, the entire Transition class's state apparently would easily fit on a long).

Thing is, if you move, for example, the Transition class to simply a primitive, then you're not forced to use anymore the slow HashMap<Transition,...> default Java collections: you can use libraries like Trove's TLongObjectHashMap (or TLongInt... or TLongLong, whatever) which owns the default HashMap big times (the Trove libraries basically provides maps and sets that are super efficient, both fast and small, when you work with primitives: you don't generate countless garbage nor constant needless wrapping around primitives, so less GC etc. If you're into performance, then you do want to check Trove... And their 3.0 upcoming release is 20% faster than Trove 2.0).

But is it really useful? Apparently that library is already plenty of fast. There's no doubt it can be made faster by not wastefully creating objects and by using collections that do actually perform well but it's not clear that it would be desirable.

Besides that, I'm pretty sure that the library above is not thread safe. The State constructor creates a unique ID by doing this:

static int next_id;
.
.
.
id = next_id++;

and that constructor is called from... 90 different places!

Textbook example of a way to not create a unique ID in a multi-threaded scenario (heck, even making next_id volatile wouldn't be sufficient, you want, say, an AtomicInteger here). I don't know the library well enough but this ID thinggy looks very fishy to me.

回复收藏 0 原文