跟踪未初始化的静态变量

发布于 2024-12-10 18:46:14 字数 375 浏览 1 评论 0原文

我需要调试一个丑陋且巨大的数学 C 库,可能曾经由 f2c 生成。该代码滥用了本地静态变量,不幸的是,它似乎在某个地方利用了这些变量自动初始化为 0 的事实。如果使用相同的输入调用其入口函数两次,则会给出不同的结果。如果我卸载该库并再次重新加载,它将正常工作。它需要很快,所以我想摆脱加载/卸载。

我的问题是如何使用 valgrind 或任何其他工具发现这些错误,而无需手动遍历整个代码。

我正在寻找声明局部静态变量的地方,首先读取,然后才写入。由于静态变量有时通过指针进一步传递(是的 - 它太难看了),问题变得更加复杂。

我理解有人可能会说这样的错误不一定需要由自动工具检测到,因为在某些情况下这正是预期的行为。不过,有没有办法让自动初始化的局部静态变量变得“脏”?

I need to debug an ugly and huge math C library, probably once produced by f2c. The code is abusing local static variables, and unfortunately somewhere it seems to exploit the fact that these are automatically initialized to 0. If its entry function is called with the same input twice, it is giving different results. If I unload the library and reload it again, it works correctly. It needs to be fast, so I would like to get rid of the load/unload.

My question is that how to uncover these errors with valgrind or by any other tool without manually walking through the entire code.

I am hunting places where a local static variable is declared, read first, and written only later. The problem is even further complicated by the fact that the static variables are sometimes passed further via pointers (yep - it is so ugly).

I understand that one can argue that mistakes like this should not be necessary detected by an automatic tool, as in some scenarios this is exactly the intended behaviour. Still, is there a way to make the auto-initialized local static variables "dirty"?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

安静被遗忘 2024-12-17 18:46:14

细节决定成败,但这可能对您有用:

首先,获取 Frama-C。如果您使用的是 Unix,您的发行版可能有一个软件包。该软件包不会是最后一个版本,但它可能已经足够好,并且如果您以这种方式安装它,它将节省您一些时间。

假设您的示例如下所示,只是太大了以至于不明显出了什么问题:

int add(int x, int y)
{
  static int state;
  int result = x + y + state; // I tested it once and it worked.
  state++;
  return result;
}

键入如下命令:

frama-c -lib-entry -main add -deps ugly.c

Options -lib-entry -main add 意思是“查看函数add”。选项-deps计算函数依赖关系。您将在日志中找到这些“功能依赖项”:

[from] Function add:
     state FROM state; (and default:false)
     \result FROM x; y; state; (and default:false)

这列出了 add 结果所依赖的实际输入,以及实际输出根据这些输入计算,包括读取和修改的静态变量。在使用之前正确初始化的静态变量通常不会作为输入出现,除非分析器无法确定它在读取之前始终已初始化。

日志将 state 显示为 \result 的依赖项。如果您期望返回的结果仅取决于参数(意味着具有相同参数的两次调用会产生相同的结果),则暗示变量 state 可能存在问题。

上面几行显示的另一个提示是该函数修改state

这可能有帮助,也可能没有帮助。选项 -lib-entry 意味着分析器不会假设任何非常量静态变量在调用分析的函数时保持其值,因此这对于您的代码来说可能太不精确。有很多方法可以解决这个问题,但这取决于你是否愿意花时间去学习这些方法。

编辑:这是一个更复杂的示例:

void initialize_1(int *p)
{
  *p = 0;
}

void initialize_2(int *p)
{
  *p; // I made a mistake here.
}

int add(int x, int y)
{
  static int state1;
  static int state2;

  initialize_1(&state1);
  initialize_2(&state2);

  // This is safe because I have initialized state1 and state2:
  int result = x + y + state1 + state2; 

  state1++;
  state2++;
  return result;
}

在此示例中,相同的命令生成结果:

[from] Function initialize_1:
         state1 FROM p
[from] Function initialize_2:
[from] Function add:
         state1 FROM \nothing
         state2 FROM state2
         \result FROM x; y; state2

您看到的 initialize_2 是一个空的依赖项列表,这意味着该函数不分配任何内容。我将通过显示明确的消息而不仅仅是空列表来使这种情况更清楚。如果您知道 initialize_1initialize_2add 函数的用途,您可以将此先验知识与以下结果进行比较分析后发现 initialize_2add 有问题。

第二次编辑:现在我的示例显示了 initialize_1 的一些奇怪之处,所以也许我应该解释一下。变量 state1 取决于 p,因为 p 用于写入 state1,并且如果 p 不同,那么 state1 的最终值也会不同。这是最后一个示例:

int t[10];

void initialize_index(int i)
{
  t[i] = 1;
}

int main(int argc, char **argv)
{
  initialize_index(argv[1][0]-'0');
}

使用命令 frama-c -deps tc,为 initialize_index 计算的依赖关系为:

[from] Function initialize_index:
         t[0..9] FROM i (and SELF)

这意味着每个单元都依赖于 i (如果i 是该特定单元格的索引,则可以对其进行修改)。每个单元格还可以保留其值(如果i表示另一个单元格):这在最新版本中用(和SELF)提及来指示,并用更多指示在以前的版本中晦涩难懂的(默认值:true)

The devil is in the details, but this may work for you:

First, get Frama-C. If you are using Unix, your distribution may have a package. The package won't be the last version but it may be good enough and it will save you some time if you install it this way.

Say your example is as below, only so much bigger that it's not obvious what is wrong:

int add(int x, int y)
{
  static int state;
  int result = x + y + state; // I tested it once and it worked.
  state++;
  return result;
}

Type a command like:

frama-c -lib-entry -main add -deps ugly.c

Options -lib-entry -main add mean "look at function add". Option -deps computes functional dependencies. You'll find these "functional dependencies" in the log:

[from] Function add:
     state FROM state; (and default:false)
     \result FROM x; y; state; (and default:false)

This lists the actual inputs the results of add depend on, and the actual outputs computed from these inputs, including static variables read from and modified. A static variable that was properly initialized before being used would normally not appear as input, unless the analyzer was unable to determine that it was always initialized before being read from.

The log shows state as dependency of \result. If you expected the returned result to depend only on the arguments (meaning two calls with the same arguments produce the same result), it's a hint there may be something wrong here, with the variable state.

Another hint shown in the above lines is that the function modifies state.

This may help or not. Option -lib-entry means that the analyzer does not assume that any non-const static variable has kept its value at the time the function under analysis is called, so that may be too imprecise for your code. There are ways around that, but then it is up to you whether you want to gamble the time it takes to learn these ways.

EDIT: here is a more complex example:

void initialize_1(int *p)
{
  *p = 0;
}

void initialize_2(int *p)
{
  *p; // I made a mistake here.
}

int add(int x, int y)
{
  static int state1;
  static int state2;

  initialize_1(&state1);
  initialize_2(&state2);

  // This is safe because I have initialized state1 and state2:
  int result = x + y + state1 + state2; 

  state1++;
  state2++;
  return result;
}

On this example, the same command produces the results:

[from] Function initialize_1:
         state1 FROM p
[from] Function initialize_2:
[from] Function add:
         state1 FROM \nothing
         state2 FROM state2
         \result FROM x; y; state2

What you see for initialize_2 is an empty list of dependencies, meaning the function assigns nothing. I will make this case clearer by displaying an explicit message rather than just an empty list. If you know what any of the functions initialize_1, initialize_2 or add is supposed to do, you can compare this a priori knowledge to the results of the analysis and see that something is wrong for initialize_2 and add.

SECOND EDIT: and now my example shows something strange for initialize_1, so perhaps I should explain that. Variable state1 depends on p in the sense that p is used to write to state1, and if p had been different, then the final value of state1 would have been different. Here is a last example:

int t[10];

void initialize_index(int i)
{
  t[i] = 1;
}

int main(int argc, char **argv)
{
  initialize_index(argv[1][0]-'0');
}

With the command frama-c -deps t.c, the dependencies computed for initialize_index are:

[from] Function initialize_index:
         t[0..9] FROM i (and SELF)

This means that each of the cells depends on i (it may be modified if i is the index of that particular cell). Each cell may also keep its value (if i indicates another cell): this is indicated with the (and SELF) mention in the latest version, and was indicated with a more obscure (and default:true) in previous versions.

云之铃。 2024-12-17 18:46:14

静态代码分析工具非常擅长发现典型的编程错误,例如使用未初始化的变量。 这里是针对 C 语言执行此操作的免费工具列表。

不幸的是,我无法推荐列表中的任何工具。我只熟悉两种商业产品,CoverityKlocwork。覆盖率非常好(而且昂贵)。 Klocwork就一般般了(但更便宜)。

Static code analysis tools are pretty good at finding typical programming errors like the use of uninitialized variables. Here is a list of free tools that do this for C.

Unfortunately I can't recommend any of the tools in the list. I am only familiar with two commercial products, Coverity and Klocwork. Coverity is very good (and expensive). Klocwork is so so (but less expensive).

给不了的爱 2024-12-17 18:46:14

我最终所做的是通过“#define static”从代码中删除所有静态限定符。这会将未初始化的静态使用转变为无效使用,并且这些工具可以发现我正在寻找的滥用类型。

在我的实际情况中,这足以确定错误的位置,但在更一般的情况下,如果静态实际上正在做一些重要的事情,则应该通过在代码无法继续时逐渐重新添加“静态”来进行改进。

What I did in the end is removed all static qualifiers from the code by '#define static'. This turns uninitialised static usage into invalid use, and the type of abuse I am hunting can be uncovered by the tools.

In my actual case this was enough to determine the place of the bug, but in a more general situation it should be refined if static's are actually doing something important, by gradually re-adding 'static' when the code fails to continue.

二货你真萌 2024-12-17 18:46:14

我不知道有哪个库可以为您执行此操作,但我会考虑使用正则表达式来查找它们。 这样的东西

像rgrep "static\s*int" path/to/src/root | grep -v = | grep -v "("

这应该返回所有没有等号声明的静态 int 变量,最后一个管道应该删除其中带有括号的任何内容(摆脱函数)。有一个很好的改变,这并不完全适合你,但是使用 grep 可能是您追踪此问题的最快方法,

当然,一旦找到有效的方法,您也可以将 int 替换为所有其他类型的变量来搜索这些变量。

I don't know of any library that does this for you, but I would look into using regular expressions to find them. Something like

rgrep "static\s*int" path/to/src/root | grep -v = | grep -v "("

That should return all static int variables declared without an equals sign, and the last pipe should remove anything with parenthesis in them (getting rid of funcions). There's a good change that this won't work exactly for you, but playing around with grep may be the fastest way for you to track this down.

Of course, once you find one that works you can replace int with all of the other kinds of variables to search for those too. HTH

旧情勿念 2024-12-17 18:46:14

我的问题是如何发现这些错误......

但是这些不是错误:静态变量初始化为 0 的期望是完全有效的,就像为它分配一些其他值一样。

因此,要求一个能够自动发现错误的工具不太可能产生令人满意的结果。

从您的描述来看, somefunc() 在第一次调用时返回正确的结果,在后续调用中返回错误的结果。

调试此类问题的最简单方法是并排使用两个 GDB 会话:一个是新加载的(将计算正确的答案),另一个是“第二次迭代”(将计算错误的答案)。然后“并行”地逐步执行两个会话,并查看它们的计算或控制流在哪里开始出现分歧。

由于您通常可以有效地将问题一分为二,因此通常不需要很长时间就能找到错误。 总是重现的错误是最容易发现的。去做就对了。

My question is that how to uncover these errors ...

But these aren't errors: the expectation that a static variable is initialized to 0 is perfectly valid, as is assigning some other value to it.

So asking for a tool that will automatically find non-errors is unlikely to produce a satisfying result.

From your description, it appears that somefunc() returns correct result first time it is called, and incorrect result on subsequent calls.

The simplest way to debug such problems is to have two GDB sessions side-by-side: one freshly-loaded (will compute correct answer), and one with "second iteration" (will compute wrong answer). Then step through both sessions "in parallel", and see where their computation or control flow starts to diverge.

Since you can usually effectively divide the problem in half, it often doesn't take long to find the bug. Bugs that always reproduce are the easiest ones to find. Just do it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文