阅读“圈复杂度的限制是多少?”,我意识到我的许多同事对这个新的 我们项目的 QA 政策:不超过 10 圈复杂度< /a> 每个函数。
含义:不超过10个'if'、'else'、'try'、'catch'等代码工作流分支语句。 正确的。 正如我在“你测试私有方法吗?”中解释的那样,这样的政策有许多好的副作用。
但是:在我们的(200 人 - 7 年)项目开始时,我们很高兴记录(不,我们不能轻易地将其委托给某种“面向方面的编程'日志方法)。
myLogger.info("A String");
myLogger.fine("A more complicated String");
...
当我们系统的第一个版本上线时,我们遇到了巨大的内存问题,这不是因为日志记录(它一度被关闭),而是因为日志参数(字符串),这总是被计算,然后传递给“info()”或“fine()”函数,却发现日志记录级别为“OFF”,并且没有发生任何日志记录!
所以 QA 回来敦促我们的程序员进行条件日志记录。 总是。
if(myLogger.isLoggable(Level.INFO) { myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE) { myLogger.fine("A more complicated String");
...
但现在,由于每个函数限制为“无法移动”10 圈复杂度级别,他们认为放入函数中的各种日志被认为是一种负担,因为每个“if(isLoggable())”都是算作 +1 圈复杂度!
因此,如果一个函数在一个紧密耦合的不易共享的算法中具有 8 个“if”、“else”等,以及 3 个关键日志操作……即使条件日志可能不是 < em>确实该功能的复杂性的一部分...
您将如何解决这种情况?
我在我的项目中看到了一些有趣的编码演变(由于“冲突”),但我只想先了解您的想法。
谢谢大家的回答。
我必须坚持认为问题不是与“格式”相关,而是与“参数评估”相关(在调用一个什么都不做的方法之前进行评估可能会非常昂贵)
因此,当 a 在上面写“A String”时,我实际上指的是 aFunction(),aFunction() 返回一个 String,并调用一个复杂的方法来收集和计算要由记录器显示的所有类型的日志数据......或不(因此出现问题,以及使用条件日志记录的义务,因此人为增加“圈复杂度”的实际问题......)
我现在得到“可变参数函数'由你们中的一些人提出(谢谢约翰)。
注意:java6 中的快速测试显示我的 varargs 函数< /a> 在调用之前会评估其参数,因此它不能应用于函数调用,但适用于“日志检索器对象”(或“函数包装器”),只有在需要时才会调用 toString() 。 知道了。
我现在已经发布了我在这个主题上的经验。
我会将其保留到下周二进行投票,然后我将选择您的答案之一。
再次感谢您的所有建议:)
After reading "What’s your/a good limit for cyclomatic complexity?", I realize many of my colleagues were quite annoyed with this new QA policy on our project: no more 10 cyclomatic complexity per function.
Meaning: no more than 10 'if', 'else', 'try', 'catch' and other code workflow branching statement. Right. As I explained in 'Do you test private method?', such a policy has many good side-effects.
But: At the beginning of our (200 people - 7 years long) project, we were happily logging (and no, we can not easily delegate that to some kind of 'Aspect-oriented programming' approach for logs).
myLogger.info("A String");
myLogger.fine("A more complicated String");
...
And when the first versions of our System went live, we experienced huge memory problem not because of the logging (which was at one point turned off), but because of the log parameters (the strings), which are always calculated, then passed to the 'info()' or 'fine()' functions, only to discover that the level of logging was 'OFF', and that no logging were taking place!
So QA came back and urged our programmers to do conditional logging. Always.
if(myLogger.isLoggable(Level.INFO) { myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE) { myLogger.fine("A more complicated String");
...
But now, with that 'can-not-be-moved' 10 cyclomatic complexity level per function limit, they argue that the various logs they put in their function is felt as a burden, because each "if(isLoggable())" is counted as +1 cyclomatic complexity!
So if a function has 8 'if', 'else' and so on, in one tightly-coupled not-easily-shareable algorithm, and 3 critical log actions... they breach the limit even though the conditional logs may not be really part of said complexity of that function...
How would you address this situation ?
I have seen a couple of interesting coding evolution (due to that 'conflict') in my project, but I just want to get your thoughts first.
Thank you for all the answers.
I must insist that the problem is not 'formatting' related, but 'argument evaluation' related (evaluation that can be very costly to do, just before calling a method which will do nothing)
So when a wrote above "A String", I actually meant aFunction(), with aFunction() returning a String, and being a call to a complicated method collecting and computing all kind of log data to be displayed by the logger... or not (hence the issue, and the obligation to use conditional logging, hence the actual issue of artificial increase of 'cyclomatic complexity'...)
I now get the 'variadic function' point advanced by some of you (thank you John).
Note: a quick test in java6 shows that my varargs function does evaluate its arguments before being called, so it can not be applied for function call, but for 'Log retriever object' (or 'function wrapper'), on which the toString() will only be called if needed. Got it.
I have now posted my experience on this topic.
I will leave it there until next Tuesday for voting, then I will select one of your answers.
Again, thank you for all the suggestions :)
发布评论
评论(12)
对于当前的日志框架,问题是没有意义的。
当前的日志框架(如 slf4j 或 log4j 2)在大多数情况下不需要保护语句。 它们使用参数化日志语句,以便可以无条件记录事件,但仅在启用事件时才会进行消息格式化。 消息构造是由记录器根据需要执行的,而不是由应用程序抢先执行的。
如果您必须使用旧的日志记录库,您可以继续阅读以获取更多背景知识以及使用参数化消息改造旧库的方法。
保护语句真的增加了复杂性吗?
考虑从圈复杂度计算中排除日志保护语句。
可能有人会说,由于其可预测的形式,条件日志检查实际上不会增加代码的复杂性。
不灵活的指标可能会让原本优秀的程序员变坏。 当心!
假设您的计算复杂性的工具无法定制到这种程度,以下方法可能会提供解决方法。
条件日志记录的需要
我假设引入了 Guard 语句,因为您有这样的代码:
在 Java 中,每个日志语句都会创建一个新的 StringBuilder,并调用 toString()< /code> 每个对象上连接到字符串的方法。 这些
toString()
方法可能会创建自己的StringBuilder
实例,并调用其成员的toString()
方法,等等,跨越一个可能很大的对象图。 (在 Java 5 之前,它的成本甚至更高,因为使用了StringBuffer
,并且它的所有操作都是同步的。)这可能相对昂贵,特别是如果日志语句位于某些执行频繁的代码中小路。 而且,如上所述,即使记录器由于日志级别太高而必然丢弃结果,也会发生昂贵的消息格式化。
这导致引入以下形式的保护语句:
使用此保护,仅在必要时才执行参数
d
和w
的计算以及字符串连接。简单、高效日志记录的解决方案
但是,如果记录器(或围绕所选日志记录包编写的包装器)采用格式化程序和格式化程序参数,则消息构造可能会延迟,直到确定将使用它为止,同时消除了守卫语句及其圈复杂度。
现在,不会发生任何级联
toString()
调用及其缓冲区分配,除非有必要! 这有效地消除了导致 Guard 语句的性能影响。 在 Java 中,一个小的缺点是对传递给记录器的任何基本类型参数进行自动装箱。可以说,执行日志记录的代码比以往任何时候都更加干净,因为不整齐的字符串连接已经消失了。 如果格式字符串被外部化(使用ResourceBundle),它会更干净,这也有助于软件的维护或本地化。
进一步增强
另请注意,在 Java 中,可以使用
MessageFormat
对象代替“格式”String
,它为您提供了额外的功能,例如选择要处理的格式基数更加整齐。 另一种选择是实现您自己的格式化功能,该功能调用您为“评估”定义的某些接口,而不是基本的 toString() 方法。With current logging frameworks, the question is moot
Current logging frameworks like slf4j or log4j 2 don't require guard statements in most cases. They use a parameterized log statement so that an event can be logged unconditionally, but message formatting only occurs if the event is enabled. Message construction is performed as needed by the logger, rather than pre-emptively by the application.
If you have to use an antique logging library, you can read on to get more background and a way to retrofit the old library with parameterized messages.
Are guard statements really adding complexity?
Consider excluding logging guards statements from the cyclomatic complexity calculation.
It could be argued that, due to their predictable form, conditional logging checks really don't contribute to the complexity of the code.
Inflexible metrics can make an otherwise good programmer turn bad. Be careful!
Assuming that your tools for calculating complexity can't be tailored to that degree, the following approach may offer a work-around.
The need for conditional logging
I assume that your guard statements were introduced because you had code like this:
In Java, each of the log statements creates a new
StringBuilder
, and invokes thetoString()
method on each object concatenated to the string. ThesetoString()
methods, in turn, are likely to createStringBuilder
instances of their own, and invoke thetoString()
methods of their members, and so on, across a potentially large object graph. (Before Java 5, it was even more expensive, sinceStringBuffer
was used, and all of its operations are synchronized.)This can be relatively costly, especially if the log statement is in some heavily-executed code path. And, written as above, that expensive message formatting occurs even if the logger is bound to discard the result because the log level is too high.
This leads to the introduction of guard statements of the form:
With this guard, the evaluation of arguments
d
andw
and the string concatenation is performed only when necessary.A solution for simple, efficient logging
However, if the logger (or a wrapper that you write around your chosen logging package) takes a formatter and arguments for the formatter, the message construction can be delayed until it is certain that it will be used, while eliminating the guard statements and their cyclomatic complexity.
Now, none of the cascading
toString()
calls with their buffer allocations will occur unless they are necessary! This effectively eliminates the performance hit that led to the guard statements. One small penalty, in Java, would be auto-boxing of any primitive type arguments you pass to the logger.The code doing the logging is arguably even cleaner than ever, since untidy string concatenation is gone. It can be even cleaner if the format strings are externalized (using a
ResourceBundle
), which could also assist in maintenance or localization of the software.Further enhancements
Also note that, in Java, a
MessageFormat
object could be used in place of a "format"String
, which gives you additional capabilities such as a choice format to handle cardinal numbers more neatly. Another alternative would be to implement your own formatting capability that invokes some interface that you define for "evaluation", rather than the basictoString()
method.在 Python 中,您将格式化值作为参数传递给日志记录函数。 仅当启用日志记录时才应用字符串格式。 函数调用的开销仍然存在,但这与格式化相比微不足道。
您可以对任何具有可变参数的语言(C/C++、C#/Java 等)执行类似的操作。
这并不是真正适用于参数难以检索的情况,而是适用于将它们格式化为字符串的成本昂贵的情况。 例如,如果您的代码中已有数字列表,您可能需要记录该列表以进行调试。 执行 mylist.toString() 需要一段时间,但没有任何好处,因为结果将被丢弃。 因此,您将
mylist
作为参数传递给日志记录函数,并让它处理字符串格式。 这样,只有在需要时才会执行格式化。由于OP的问题特别提到了Java,以下是如何使用上述内容:
诀窍是让对象不会执行昂贵的操作计算直到绝对需要为止。 这在 Smalltalk 或 Python 等支持 lambda 和闭包的语言中很容易实现,但在 Java 中只要发挥一点想象力也是可行的。
假设您有一个函数
get_everything()
。 它将把数据库中的每个对象检索到列表中。 显然,如果结果将被丢弃,您不想调用它。 因此,您不是直接使用对该函数的调用,而是定义一个名为LazyGetEverything
的内部类:在此代码中,对
getEverything()
的调用进行了包装,以便它不会“直到需要时才会真正执行。 仅当启用调试时,日志记录函数才会对其参数执行toString()
。 这样,您的代码将只承受函数调用的开销,而不是完整的getEverything()
调用。In Python you pass the formatted values as parameters to the logging function. String formatting is only applied if logging is enabled. There's still the overhead of a function call, but that's minuscule compared to formatting.
You can do something like this for any language with variadic arguments (C/C++, C#/Java, etc).
This isn't really intended for when the arguments are difficult to retrieve, but for when formatting them to strings is expensive. For example, if your code already has a list of numbers in it, you might want to log that list for debugging. Executing
mylist.toString()
will take a while to no benefit, as the result will be thrown away. So you passmylist
as a parameter to the logging function, and let it handle string formatting. That way, formatting will only be performed if needed.Since the OP's question specifically mentions Java, here's how the above can be used:
The trick is to have objects that will not perform expensive computations until absolutely needed. This is easy in languages like Smalltalk or Python that support lambdas and closures, but is still doable in Java with a bit of imagination.
Say you have a function
get_everything()
. It will retrieve every object from your database into a list. You don't want to call this if the result will be discarded, obviously. So instead of using a call to that function directly, you define an inner class calledLazyGetEverything
:In this code, the call to
getEverything()
is wrapped so that it won't actually be executed until it's needed. The logging function will executetoString()
on its parameters only if debugging is enabled. That way, your code will suffer only the overhead of a function call instead of the fullgetEverything()
call.在支持 lambda 表达式或代码块作为参数的语言中,一种解决方案是将其提供给日志记录方法。 该人可以评估配置,并且仅在需要时才实际调用/执行提供的 lambda/代码块。
不过还没有尝试过。
理论上这是可能的。 我不想在生产中使用它,因为我预计大量使用 lamdas/代码块进行日志记录会出现性能问题。
但一如既往:如果有疑问,请对其进行测试并测量对 CPU 负载和内存的影响。
In languages supporting lambda expressions or code blocks as parameters, one solution for this would be to give just that to the logging method. That one could evaluate the configuration and only if needed actually call/execute the provided lambda/code block.
Did not try it yet, though.
Theoretically this is possible. I would not like to use it in production due to performance issues i expect with that heavy use of lamdas/code blocks for logging.
But as always: if in doubt, test it and measure the impact on cpu load and memory.
感谢您的所有回答! 你们太棒了:)
现在我的反馈并不像你们的那么直接:
是的,对于一个项目(例如“在一个生产平台上独立部署和运行一个程序”),我假设您可以对我进行所有技术操作:
就这样,正如 @John Millikin 和 @erickson 所解释的那样。
然而,这个问题迫使我们思考一下“我们到底为什么要首先登录?”
我们的项目实际上是30个不同的项目(每个5到10人)部署在各种生产平台上,具有异步通信需求和中央总线架构。
问题中描述的简单日志记录对于每个项目在开始时(5 年前)来说都很好,但从那时起,我们必须加强。 输入KPI。
我们不要求记录器记录任何内容,而是要求自动创建的对象(称为 KPI)注册事件。 这是一个简单的调用(myKPI.I_am_signaling_myself_to_you()),并且不需要有条件(这解决了“人工增加圈复杂度”问题)。
该 KPI 对象知道谁调用它,并且由于他从应用程序的开头运行,因此他能够检索我们之前在记录时当场计算的大量数据。
另外,可以独立监控 KPI 对象,并在单个单独的发布总线上按需计算/发布其信息。
这样,每个客户端都可以询问他真正想要的信息(例如,“我的流程是否开始了,如果是,从什么时候开始?”),而不是寻找正确的日志文件并查找神秘的字符串......
确实,问题“我们到底为什么要登录?” 让我们意识到我们不仅仅为程序员及其单元或集成测试进行日志记录,而是为更广泛的社区(包括一些最终客户本身)进行日志记录。 我们的“报告”机制必须是集中的、异步的、24/7 的。
该 KPI 机制的具体内容超出了本问题的范围。 可以说,它的正确校准是迄今为止我们面临的最复杂的非功能性问题。 它仍然时不时地让系统崩溃! 然而,经过适当校准,它可以挽救生命。
再次感谢您的所有建议。 当简单的日志记录仍然存在时,我们将在系统的某些部分考虑它们。
但这个问题的另一点是在更大、更复杂的背景下向您说明一个具体问题。
希望你喜欢它。 下周晚些时候我可能会问一个关于 KPI 的问题(不管你信不信,到目前为止,SOF 上没有任何问题!)。
我会将这个答案留到下周二进行投票,然后我将选择一个答案(显然不是这个;))
Thank you for all your answers! You guys rock :)
Now my feedback is not as straight-forward as yours:
Yes, for one project (as in 'one program deployed and running on its own on a single production platform'), I suppose you can go all technical on me:
and there you have it, as explained by @John Millikin and @erickson.
However, this issue forced us to think a little about 'Why exactly we were logging in the first place ?'
Our project is actually 30 different projects (5 to 10 people each) deployed on various production platforms, with asynchronous communication needs and central bus architecture.
The simple logging described in the question was fine for each project at the beginning (5 years ago), but since then, we has to step up. Enter the KPI.
Instead of asking to a logger to log anything, we ask to an automatically created object (called KPI) to register an event. It is a simple call (myKPI.I_am_signaling_myself_to_you()), and does not need to be conditional (which solves the 'artificial increase of cyclomatic complexity' issue).
That KPI object knows who calls it and since he runs from the beginning of the application, he is able to retrieve lots of data we were previously computing on the spot when we were logging.
Plus that KPI object can be monitored independently and compute/publish on demand its information on a single and separate publication bus.
That way, each client can ask for the information he actually wants (like, 'has my process begun, and if yes, since when ?'), instead of looking for the correct log file and grepping for a cryptic String...
Indeed, the question 'Why exactly we were logging in the first place ?' made us realize we were not logging just for the programmer and his unit or integration tests, but for a much broader community including some of the final clients themselves. Our 'reporting' mechanism had to be centralized, asynchronous, 24/7.
The specific of that KPI mechanism is way out of the scope of this question. Suffice it to say its proper calibration is by far, hands down, the single most complicated non-functional issue we are facing. It still does bring the system on its knee from time to time! Properly calibrated however, it is a life-saver.
Again, thank you for all the suggestions. We will consider them for some parts of our system when simple logging is still in place.
But the other point of this question was to illustrate to you a specific problem in a much larger and more complicated context.
Hope you liked it. I might ask a question on KPI (which, believe or not, is not in any question on SOF so far!) later next week.
I will leave this answer up for voting until next Tuesday, then I will select an answer (not this one obviously ;) )
也许这太简单了,但是围绕保护子句使用“提取方法”重构怎么样? 您的示例代码:
变成这样:
Maybe this is too simple, but what about using the "extract method" refactoring around the guard clause? Your example code of this:
Becomes this:
在 C 或 C++ 中,我将使用预处理器而不是 if 语句来进行条件日志记录。
In C or C++ I'd use the preprocessor instead of the if statements for the conditional logging.
将日志级别传递给记录器并让它决定是否写入日志语句:
更新:啊,我看到您想有条件地创建日志字符串,而不需要条件语句。 大概是在运行时而不是编译时。
我只想说,我们解决这个问题的方法是将格式化代码放入记录器类中,以便只有在级别通过时才会进行格式化。 与内置 sprintf 非常相似。 例如:
这应该符合您的标准。
Pass the log level to the logger and let it decide whether or not to write the log statement:
UPDATE: Ah, I see that you want to conditionally create the log string without a conditional statement. Presumably at runtime rather than compile time.
I'll just say that the way we've solved this is to put the formatting code in the logger class so that the formatting only takes place if the level passes. Very similar to a built-in sprintf. For example:
That should meet your criteria.
条件日志记录是邪恶的。 它给你的代码增加了不必要的混乱。
您应该始终将拥有的对象发送到记录器:
然后使用 java.util.logging.Formatter 使用 MessageFormat 将 foo 和 bar 展平为要输出的字符串。 仅当记录器和处理程序在该级别记录时才会调用它。
为了增加乐趣,您可以使用某种表达式语言来精细控制如何格式化记录的对象(toString 可能并不总是有用)。
Conditional logging is evil. It adds unnecessary clutter to your code.
You should always send in the objects you have to the logger:
and then have a java.util.logging.Formatter that uses MessageFormat to flatten foo and bar into the string to be output. It will only be called if the logger and handler will log at that level.
For added pleasure you could have some kind of expression language to be able to get fine control over how to format the logged objects (toString may not always be useful).
(来源:scala-lang.org)
Scala 有一个注释 @elidable() 允许您删除带有编译器标志的方法。
使用 scala REPL:
带 elide-beloset
另请参阅 Scala 断言定义
(source: scala-lang.org)
Scala has a annontation @elidable() that allows you to remove methods with a compiler flag.
With the scala REPL:
With elide-beloset
See also Scala assert definition
尽管我讨厌 C/C++ 中的宏,但在工作中,我们有 if 部分的#defines,如果为 false,则忽略(不计算)以下表达式,但如果为 true,则返回一个流,可以使用 ' 将内容传输到该流中<<' 操作员。
像这样:
我认为这将消除您的工具所看到的额外“复杂性”,并且还消除了对字符串的任何计算,或者在未达到级别时要记录的任何表达式。
As much as I hate macros in C/C++, at work we have #defines for the if part, which if false ignores (does not evaluate) the following expressions, but if true returns a stream into which stuff can be piped using the '<<' operator.
Like this:
I assume this would eliminate the extra 'complexity' that your tool sees, and also eliminates any calculating of the string, or any expressions to be logged if the level was not reached.
这是一个使用三元表达式的优雅解决方案
logger.info(logger.isInfoEnabled() ? "Log Statement gone here..." : null);
Here is an elegant solution using ternary expression
logger.info(logger.isInfoEnabled() ? "Log Statement goes here..." : null);
考虑一个日志实用程序函数...
然后使用“闭包”来调用您想要避免的昂贵的评估。
Consider a logging util function ...
Then make the call with a "closure" round the expensive evaluation that you want to avoid.