重构嵌套 for 循环
我遇到这种情况,两组数据之间有父子关系。我有一个父文档集合和一个子文档集合。要求是父母及其相应的孩子需要导出为“a”pdf文档。上述情况的简单实现如下(下面是java-ish伪代码):
for(Document parentDocument:Documents){
ExportToPdf(parentDocument);
for(Document childDocument:parentDocument.children()){
AppendToParentPdf(childDocument);
}
}
上面的东西可能会解决问题,但是突然之间需求发生了变化,现在每个父母及其相应的孩子都需要位于单独的 pdf 中,因此通过将 AppendToParentPdf()
更改为 ExportToPdf()
来修改上面给定的代码片段,如下所示:
for(Document parentDocument:Documents){
ExportToPdf(parentDocument);
for(Document childDocument:parentDocument.children()){
ExportToPdf(childDocument);
}
}
按照这种方式进行,不久之后就会出现此情况似乎琐碎的代码片段可能会遭受一些严重的代码异味。
我对 SO 的问题是:
是否有更好的父子关系表示,例如上面的,而不是在
O(n^2) 中强行遍历所有文档及其子项
code> 时尚,我可以使用不同的数据结构或技术以更优化的方式遍历整个结构。在我上面描述的场景中,业务规则对于 pdf 的导出方式相当不固定,是否有更智能的方法来对导出功能的性质进行编码?而且导出格式是暂时的。 PDF 可以让位于 *.docs/csvs/xmls 等。
对此有一些看法将会很棒。
谢谢
I have this situation where I have a parent child relationship between two sets of data. I have a parent document collection and a child document collection. The requirement is that the parents and their corresponding children need to be exported into 'a' pdf document. A simple-implementation of the above situation can be as follows(java-ish pseudo code below):
for(Document parentDocument:Documents){
ExportToPdf(parentDocument);
for(Document childDocument:parentDocument.children()){
AppendToParentPdf(childDocument);
}
}
Something as above will probably solve the problem, but all of a sudden the requirements changes and now each of these parents and their corresponding children need to be in separate pdfs, so the above given snippet is modified by changing the AppendToParentPdf()
to ExportToPdf()
follows:
for(Document parentDocument:Documents){
ExportToPdf(parentDocument);
for(Document childDocument:parentDocument.children()){
ExportToPdf(childDocument);
}
}
Going along this way, it will not take long before this seemingly trivial snippet would suffer from some serious code smells.
My questions to SO are:
Are there better representations of parent-child relationships such as the above where instead of brute-forcing my way through all the documents and their children in an
O(n^2)
fashion, I can use a different data-structure or technique to traverse the entire structure in a more optimal fashion.In the scenario that I described above, where the business rules are rather fluid about the way the pdfs should be exported, is there a smarter way to code the nature of the export function? Also the export format is transient. PDFs can give way to *.docs/csvs/xmls et al.
It will be great to get some perspective on this.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这不是
O(N^2)
。它是O(N)
,其中N
是父文档和子文档的总数。假设没有一个子文档拥有多个父文档,那么您就无法显着提高性能。此外,与生成 PDF 的调用成本相比,遍历的成本可能微不足道。您可能需要考虑优化的唯一情况是子文档是否可以是多个父文档的子文档。在这种情况下,您可能希望跟踪已为其生成 PDF 的文档……并在遍历中重新访问它们时跳过它们。 “我以前见过这个文档吗”的测试可以使用
HashSet
来实现。This is not
O(N^2)
. It isO(N)
whereN
is the total number of parent and child documents. Assuming that no child has more than one parent document, then you can't significantly improve the performance. Furthermore, the cost of the traversal is probably trivial compared with the cost of the calls that generate the PDFs.The only case where you might want to consider optimizing is if child documents can be children of multiple parents. In that case, you may want to keep track of the documents that you've already generated PDF's for ... and skip them if you revisit them in the traversal. The test for "have I seen this document before" can be implemented using a
HashSet
.您可以将要对文档执行的操作封装在处理程序中。这还允许您将来定义可以传递给现有代码的新处理程序。
至于效率,我想说除非遇到性能问题,否则不要尝试优化。无论如何,问题不在于嵌套循环,而在于处理文档的逻辑本身。
You could encapsulate what you want to do with a document in a handler. This will also allow you to define new handlers in the future that you can pass to existing code.
As for efficiency, I'd say don't try to optimise unless you run into performance issues. In any case, the problem won't be with the nested loop but with the logic itself that processes the documents.
对于第二个问题,您可以使用 提供程序模式 或其扩展。
For your 2nd question, you could use the provider pattern or an extension of it.
我试图将其纳入评论中,但有太多的话要说......
我不明白你所说的更改如何是代码味道。如果这个简单功能的需求发生变化,那么它们也会发生变化。如果您只需要在一个地方进行更改,那么听起来您已经做得很好了。如果您的客户需要两种方式(或更多方式)来执行此操作,那么您可能会考虑某种策略模式,这样您就不必重写周围的代码来执行任一功能。
如果您每周进行数十次这样的更改,那么它可能会变得混乱,您可能应该制定一个计划来更有效地处理非常繁忙的更改轴。否则,纪律和重构可以帮助您保持干净。
至于n²是否是一个问题,这取决于情况。 n 有多大?如果您必须经常执行此操作(即每小时数十次)并且 n 达到数千次,那么您可能会遇到问题。否则,只要您满足或超过需求并且您的 CPU/磁盘利用率不在危险区域,我就不会担心。
I tried to fit this into a comment, but there's too much to say...
I don't see how the change you're talking about is a code smell. If the requirements change for this simple function, then they change. If you only need to make the change in one place, then it sounds like you've done a good job. If your client is going to need both ways of doing it (or more), then you might consider some sort of strategy pattern so you don't have to rewrite the surrounding code to do either function.
If you're making dozens of these changes per week, then it could get messy and you probably ought to make a plan for how to more effectively deal with a very busy axis of change. Otherwise, discipline and refactoring can help you keep it clean.
As to whether or not n² is a problem, it depends. How big is n? If you have to do this frequently (i.e. dozens of times per hour) and n is in the 1000's of them, then you might have a problem. Otherwise I wouldn't sweat it as long as you're meeting or exceeding demand and your CPU/disk utilization is out of the danger zone.
第二个问题的问题可以通过简单地使用以下方法创建一个
interface Exporter
来解决export(Document doc);
然后针对每种不同的格式实现它,例如类 DocExporterImpl 实现 Exporter
。第一个取决于您的要求,并且没有任何设计模式本身可以解决这些问题。那里帮不了你。
The second questions' problem can be solved by simply creating an
inteface Exporter
with the methodexport(Document doc);
and then implementing it for each of the various formats, e.g.class DocExporterImpl implements Exporter
.The first one is dependent on your requirements and no design pattern as such solves these problems. Can't help you there.
使用 Set 来跟踪哪些元素已被导出可能不是最漂亮的解决方案,但它将防止文档被导出两次。
Using a Set to keep track of which elements have already been exported might not be the most beautiful solution, but it will prevent the documents from being exported twice.