Win32 上使用 icu u32_regex 内存泄漏/缓存进行提升

发布于 2024-11-27 09:27:17 字数 3050 浏览 2 评论 0原文

在启用可选 ICU 支持的情况下使用 boost 正则表达式类时(请参阅 提升文档了解详细信息)我似乎遇到了内存泄漏,或者更确切地说,发生了某种内存缓存,但我似乎无法重置/清理。

有没有其他人看到过这个,并且可能知道一种清除缓存的方法,以便 boost 单元测试框架不会报告内存泄漏?

我的问题的详细信息是:-

ICU version 4.6.0
(Built using supplied vs2010 solution in debug and release configuration)
Boost version 1.45
(built with command "bjam variant=debug,release threading=multi link=shared stage" since standard distribution does not include icu support in regex)
OS Windows 7
Compiler MSVC 10 (Visual Studio 2010 Premium)

虽然我确实尝试使用带有 icu 4.2.1 的 boost 1.42,但我碰巧在我的系统上构建了相同的结果,所以不要认为它是一个可以通过更改为 boost 来解决的问题1.47 icu 4.8.1 这是最新版本。

编译以下代码 (Test.cpp):-

#define BOOST_TEST_MAIN    //Ask boost unit test framework to create a main for us
#define BOOST_ALL_DYN_LINK //Ask boost to link to dynamic library rather than purely header support where appropriate
#include <boost/test/auto_unit_test.hpp>

#include <boost/regex.hpp>
#include <boost/regex/icu.hpp> //We use icu extensions to regex to support unicode searches on utf-8
#include <unicode/uclean.h>    //We want to be able to clean up ICU cached objects

BOOST_AUTO_TEST_CASE( standard_regex ) 
{
    boost::regex re( "\\d{3}");
}

BOOST_AUTO_TEST_CASE( u32_regex ) 
{
    boost::u32regex re( boost::make_u32regex("\\d{3}"));
    u_cleanup(); //Ask the ICU library to clean up any cached memory
}

可以通过以下方式从命令行进行编译:-

C:\>cl test.cpp /I[BOOST HEADERS PATH] /I[ICU HEADERS] /EHsc /MDd -link /LIBPATH:[BOOST LIB PATH] [ICU LIB PATH]icuuc.lib

使用适合您计算机的 headers/libs 的适当路径

将适当的 boost dll 复制到包含 test.exe 的目录(如果未指定路径)在(boost_regex-vc100-mt-gd-1_45.dll和boost_unit_test_framework-vc100-mt-gd-1_45.dll)中

时从上面的test.exe步骤已运行我得到:-

Running 2 test cases...

*** No errors detected
Detected memory leaks!
Dumping objects ->
{789} normal block at 0x00410E88, 28 bytes long.
 Data: <    0N U        > 00 00 00 00 30 4E CD 55 00 00 00 00 01 00 00 00
{788} normal block at 0x00416350, 14 bytes long.
 Data: <icudt46l-coll > 69 63 75 64 74 34 36 6C 2D 63 6F 6C 6C 00
{787} normal block at 0x00415A58, 5 bytes long.
 Data: <root > 72 6F 6F 74 00
...lots of other blocks removed for clarity ...

我猜测 icu 实际上是这里的罪魁祸首,因为它的名称位于第二个块的开头。

仅进行第一个测试(即仅创建标准正则表达式而不是 u32_regex)没有检测到内存泄漏。

在测试中添加多个 u32_regex 不会导致更多内存泄漏。

我尝试按照 icu 文档 使用 u_cleanup() 调用来清理 icu 缓存,请参阅ICU 初始化和终止部分。

然而,我对 icu 库不是很熟悉(实际上我只是使用它,因为我们想要 unicode 感知正则表达式支持),并且看不到如何在 ICU 加载时调用 u_cleanup() 来实际清理数据。增强正则表达式 DLL。

只是重申一下问题似乎是:-

使用可选的 icu 支持编译的 dll 中的 boost 正则表达式(我很确定这使用了 icu 的静态链接,但这里可能是错误的)

如果我在测试程序中链接到 icuuc.lib ,那么我可以调用 u_cleanup() 这似乎不会影响通过 boost 正则表达式库加载的 ICU 实例所持有的内存(如果确实如此,那就相当奇怪了)

我在正则表达式库中找不到任何调用允许我要求它清理 ICU 数据,这才是我们真正想要调用的地方。

When using the boost regex class with the optional ICU support enabled (see boost documentation for details) I seem to get a memory leak or rather some sort of caching of memory happening which I cannot seem to reset / cleanup.

Has anyone else seen this and maybe knows of a way of clearing the cache so that the boost unit test framework will not report a memory leak?

The details for my problem are :-

ICU version 4.6.0
(Built using supplied vs2010 solution in debug and release configuration)
Boost version 1.45
(built with command "bjam variant=debug,release threading=multi link=shared stage" since standard distribution does not include icu support in regex)
OS Windows 7
Compiler MSVC 10 (Visual Studio 2010 Premium)

Though I did try this with a boost 1.42 with icu 4.2.1 which I happened to have built on my system with same results so don't think its a problem which would be solved by changing to boost 1.47 icu 4.8.1 which are the latest versions.

Compiling the following code (Test.cpp) :-

#define BOOST_TEST_MAIN    //Ask boost unit test framework to create a main for us
#define BOOST_ALL_DYN_LINK //Ask boost to link to dynamic library rather than purely header support where appropriate
#include <boost/test/auto_unit_test.hpp>

#include <boost/regex.hpp>
#include <boost/regex/icu.hpp> //We use icu extensions to regex to support unicode searches on utf-8
#include <unicode/uclean.h>    //We want to be able to clean up ICU cached objects

BOOST_AUTO_TEST_CASE( standard_regex ) 
{
    boost::regex re( "\\d{3}");
}

BOOST_AUTO_TEST_CASE( u32_regex ) 
{
    boost::u32regex re( boost::make_u32regex("\\d{3}"));
    u_cleanup(); //Ask the ICU library to clean up any cached memory
}

Which can be compiled from a command line by:-

C:\>cl test.cpp /I[BOOST HEADERS PATH] /I[ICU HEADERS] /EHsc /MDd -link /LIBPATH:[BOOST LIB PATH] [ICU LIB PATH]icuuc.lib

With the appropriate paths to headers / libs for your machine

Copy the appropriate boost dlls to the directory containing test.exe if they are not pathed in (boost_regex-vc100-mt-gd-1_45.dll and boost_unit_test_framework-vc100-mt-gd-1_45.dll)

When test.exe from above steps is run I get :-

Running 2 test cases...

*** No errors detected
Detected memory leaks!
Dumping objects ->
{789} normal block at 0x00410E88, 28 bytes long.
 Data: <    0N U        > 00 00 00 00 30 4E CD 55 00 00 00 00 01 00 00 00
{788} normal block at 0x00416350, 14 bytes long.
 Data: <icudt46l-coll > 69 63 75 64 74 34 36 6C 2D 63 6F 6C 6C 00
{787} normal block at 0x00415A58, 5 bytes long.
 Data: <root > 72 6F 6F 74 00
...lots of other blocks removed for clarity ...

I'm guessing that icu is actually the culprit here since there it has its name at the start of the 2nd block.

Just doing the 1st test (ie just creating a standard regex not a u32_regex) has no memory leaks detected.

Adding multiple u32_regex's to the test does not result in more memory being leaked.

I attempted to clean up the icu cache by using the u_cleanup() call as per the icu documentation see the ICU Initialization and Termination section.

However I am not very familiar with the icu library (actually am only using it because we wanted unicode aware regex support) and can't see how to get the u_cleanup() call to actually clean up the data when ICU is being loaded by the boost regex dll.

Just to reiterate the problem appears to be :-

boost regex in a dll compiled with optional icu support (I'm pretty sure this uses a static link to icu but may be wrong here)

If I link to icuuc.lib in test program so that I can call u_cleanup() this doesn't appear to affect the memory held by the instance of ICU loaded via the boost regex library (well it would be rather odd if it did)

I can't find any calls in regex library which allow me to ask it to cleanup the ICU data which is really where we want to make the call.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

万水千山粽是情ミ 2024-12-04 09:27:17

只是想我不妨在这里回答这个问题,因为我确实解决了这个问题(在 boost 用户的帮助下)。

问题在于拆卸的顺序 - 如果 boost regex dll 中的静态对象在单元测试框架之前没有被破坏,那么它仍然会缓存一些数据。因此 UTF 会报告内存泄漏。仅仅调用 u_cleanup() 是不够的。

确保顺序的最简单方法是将单元测试框架作为静态库链接 - 然后在任何 dll 之后销毁其对象,因此不会将缓存的对象报告为内存泄漏,因为它们已经被销毁。

Just thought that I may as well answer the question here since I did solve this (with help from boost users).

The problem is in the order of tear down - if static objects in the boost regex dll are not destructed before the unit test framework then this will still be cacheing some data. And so the UTF reports memory leaks. Simply calling u_cleanup() isn't sufficient.

The easiest way of ensuring the order is to link with the unit test framework as a static library - this then gets its objects destructed after any dlls and so doesn't report the cached objects as a memory leak since they are already destructed.

战皆罪 2024-12-04 09:27:17

u_cleanup 负责清理数据,但是如果有任何项目仍处于打开状态,则它无法清理数据。

你可以尝试不调用任何 boost 函数,而只调用 u_cleanup() 看看是否有泄漏?然后尝试只调用 u_init() ,然后调用 u_cleanup()

我对 Boost 不熟悉,不知道上面的代码是否会清理正则表达式,或者 boost 是否有任何内部缓存。泄漏的对象看起来不像普通的 ICU 数据,如果 ICU 的数据仍然打开,你会看到相当多的数据,而不是 14+5 字节

u_cleanup is what cleans up the data, however it can't clean up the data if any items are still open.

Can you try not calling any boost function, but just calling u_cleanup() and see if there are any leaks? And then try just calling u_init() and then u_cleanup()

I'm not familiar with Boost to know if the above code will cleanup the regex, or if boost has any internal caching. The leaked objects don't look like usual ICU data, if ICU's data was still open you would see quite a bit of data, not 14+5 bytes

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文