斯坦福解析器在Web服务中的使用

发布于 2024-10-05 14:18:19 字数 375 浏览 8 评论 0原文

我需要在网络服务中使用斯坦福解析器。当 SentenceParser 加载一个大对象时,我将确保它是一个单例,但在这种情况下,它是否是线程安全的(根据 http://nlp.stanford.edu/software/parser-faq.shtml)。不然怎样才能高效地完成呢?一种选择是在使用时锁定对象。

知道斯坦福大学的人们是如何为 http://nlp.stanford.edu:8080/parser 做这件事的吗/

I need to use the Stanford Parser in a web service. As SentenceParser loads a big object, I will make sure it is a singleton, but in this case, is it thread safe (no according to http://nlp.stanford.edu/software/parser-faq.shtml). How else would it be done efficiently? One option is locking the object while being used.

Any idea how the people at Stanford are doing this for http://nlp.stanford.edu:8080/parser/ ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

油焖大侠 2024-10-12 14:18:19

如果争用不是一个因素,那么锁定(同步)将是您提到的一种选择,并且它可能已经足够好了。

但是,如果存在争议,我会看到三个一般选项。

(1)每次都实例化

只需每次执行解析时将其实例化为局部变量即可。局部变量是非常安全的。当然,实例化不是免费的,但根据具体情况,可以接受。

(2) 使用线程局部变量

如果实例化成本高昂,请考虑使用线程局部变量。每个线程将保留自己的解析器副本,并且解析器实例将在给定线程上重用。然而,Threadlocals 也并非没有问题。如果未将线程局部变量设置为 null 或直到保持线程消失,则线程局部变量可能不会被垃圾回收。因此,如果它们太多,就会出现内存问题。其次,谨防重复使用。如果这些解析器是有状态的,则需要确保清理并恢复初始状态,以便后续使用 threadlocal 实例不会受到先前使用的副作用。

(3) 池化

通常不再建议使用池化,但如果对象大小确实很大,以至于您需要对允许的实例数量进行硬性限制,那么使用对象池可能是最好的选择。

If the contention is not a factor, locking (synchronization) would be one option as you mentioned, and it might be good enough.

If there are contentions, however, I see three general options.

(1) instantiating it every time

Just instantiate it as a local variable every time you perform parsing. Local variables are trivially safe. The instantiation is not free of course, but it may be acceptable depending on the specific situation.

(2) using threadlocals

If instantiation turns out to be costly, consider using threadlocals. Each thread would retain its own copy of the parser, and the parser instance would be reused on a given thread. Threadlocals are not without problems, however. Threadlocals may not be garbage collected without being set to null or until the holding thread goes away. So there is a memory concern if there are too many of them. Second, beware of the reuse. If these parsers are stateful, you need to ensure to clean up and restore the initial state so subsequent use of the threadlocal instance does not suffer from the side effect of previous use.

(3) pooling

Pooling is in general no longer recommended, but if the object sizes are truly large so that you need to have a hard limit on the number of instances you can allow, then using an object pool might be the best option.

内心激荡 2024-10-12 14:18:19

我不知道斯坦福大学的人是如何实现他们的服务的,但我会基于消息框架构建这样的服务,例如 http://www.rabbitmq.com/。因此,您的前端服务将接收文档并使用消息队列与执行 NLP 解析的多个工作人员进行通信(存储文档和检索结果)。完成处理后,工作人员会将结果存储到前端服务使用的队列中。这种架构将允许您在高负载的情况下动态添加新的工作人员。特别是 NLP 标记需要一些时间 - 每个文档长达几秒钟。

I don't know how the people at Stanford have implemented their service but I would build such a service based on a message framework, such as http://www.rabbitmq.com/. So your front end service will receive documents and use a message queue to communicate (store documents and retrieve results) with several workers that execute NLP parsing. The workers -- after finishing processing -- will store results into a queue that is consumed by the front end service. This architecture will let you to dynamically add new workers in case of high load. Especially that NLP tagging takes some time - up several seconds per document.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文