printf 和不安全的格式化字符串
相关应用程序允许用户在纯文本配置文件中定义自己的消息(主要用于自定义和/或本地化目的),这些消息在运行时传递给 printf 样式函数。如果用户定义的格式化字符串有错误,可能会发生很多不好的事情。
清理此类用户输入的格式字符串的最佳方法是什么?或者我应该完全放弃这种方法并使用另一种方法来让用户安全地自定义消息?
解决方案必须以某种方式可移植(Windows、Linux、BSD、x86、x86-64)。
The application in question allows users to define their own messages (mainly for customization and/or localization purposes) in plain-text configuration file, which are passed to printf-style functions at runtime. If the user-defined formatting string is faulty, a whole lot of bad things can happen.
What is the best way to sanitize such user-inputted formatting strings? Or should I drop this approach entirely and use another method to let users safely customize the messages?
Solution must be somehow portable (Windows, Linux, BSD, x86, x86-64).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
定义您自己的格式化语言,您的代码将其转换为有效的格式字符串,从而限制用户可能遇到的麻烦(例如,根本不允许使用%,并定义您自己的符号/标记来指示应该出现%在输出中)。
Define your own formatting language, which your code translates into a valid format string, thereby restricting what trouble the user can get into (for example, not allowing % at all, and defining your own symbol/marker to use to indicate a % should appear in the output).
你有两个选择:
让用户的混乱(有意或无意)仅自己搞乱,即不要让用户的个人配置相互干扰
不要让用户自定义结果。或者,如果您这样做,请将定制设置得如此有限,以至于他们无法做任何有害的事情。
例如,我经常做这样的事情:允许用户向诸如
printf()
之类的内容提供自己的输入,但过滤器只允许具有特定(非常有限)字符集的内容。例如,我将使用类似^[a-zA-Z0-9_]+$
的正则表达式,并且不要让其他任何内容进入。任何时候您提供定制,您都打开了大门问题。在这些地方小心行走。
You have two choices:
Let the user's mess-ups (intentional or not) mess up only themselves, i.e. don't let the users' personal configurations interfere with each other
Don't let users customize the results. Or if you do, make the customization so limited that there is nothing they can do that is harmful.
For example, I've frequently done things where users are allowed to provide their own input to things like
printf()
, but the filters only allowed for things with a certain (very limited) character set. E.g., I'll use a regexp of something like^[a-zA-Z0-9_]+$
and don't let anything else in.Any time you offer customization, you open the door to problems. Walk carefully in these grounds.