我可以从哪里开始解决这个优化问题?

发布于 2024-09-12 13:02:29 字数 1753 浏览 3 评论 0原文

我有一个简单的程序,它从文件系统读取一堆内容,过滤结果,然后打印它们。这个简单的程序实现了特定于领域的语言,使选择更加容易。该DSL“编译”成如下所示的执行计划(输入为C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf):

Index  Success  Failure  Description
    0        S        1  File Matches C:\Windows\System32\*
    1        S        2  File MD5 Matches ABCDEFG
    2        S        F  File is file. (Not directory)

过滤器应用于给定的文件,如果成功,索引指针跳转到成功字段指示的索引,如果失败,索引指针跳转到失败字段指示的编号。 “S”表示文件通过过滤器,F 表示文件应被拒绝。

当然,基于简单文件属性 (!FILE_ATTRIBUTE_DIRECTORY) 检查的过滤器比基于文件 MD5 的检查快得多,后者需要打开并执行文件的实际哈希。每个过滤器“操作码”都有一个与之关联的时间类别; MD5 获得高计时数,ISFILE 获得低计时数。

我想重新排序这个执行计划,以便尽可能少地执行需要很长时间的操作码。对于上述计划,这意味着它必须是:

Index  Success  Failure  Description
    0        S        1  File is file. (Not directory)
    1        S        2  File Matches C:\Windows\System32\*
    2        S        F  File MD5 Matches ABCDEFG

根据《龙书》,为三地址代码选择最佳执行顺序是一个NP完全问题(至少根据《龙书》第二版第511页)该文本),但在这种情况下,他们正在讨论寄存器分配和机器的其他问题。就我而言,实际的“中间代码”要简单得多。我想知道是否存在一种方案,可以让我将源 IL 重新排序为最佳执行计划。

这是另一个例子:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }

解析为:

Index  Success  Failure  Description
    0        1        2  File Matches C:\Windows\Inf\*
    1        S        2  File is a Portable Executable
    2        3        F  File is file. (Not directory)
    3        F        S  File Matches C:\Windows\System32\Drivers\*

最佳:

Index  Success  Failure  Description
    0        1        2  File is file. (Not directory)
    1        2        S  File Matches C:\Windows\System32\Drivers\*
    2        3        F  File Matches C:\Windows\Inf\*
    3        S        F  File is a Portable Executable

I have a simple program which reads a bunch of things from the filesystem, filters the results , and prints them. This simple program implements a domain specific language to make selection easier. This DSL "compiles" down into an execution plan that looks like this (Input was C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf):

Index  Success  Failure  Description
    0        S        1  File Matches C:\Windows\System32\*
    1        S        2  File MD5 Matches ABCDEFG
    2        S        F  File is file. (Not directory)

The filter is applied to the given file, and if it succeeds, the index pointer jumps to the index indicated in the success field, and if it fails, the index pointer jumps to the number indicated in the failure field. "S" means that the file passes the filter, F means that the file should be rejected.

Of course, a filter based upon a simple file attribute (!FILE_ATTRIBUTE_DIRECTORY) check is much faster than a check based upon the MD5 of the file, which requires opening and performing the actual hash of the file. Each filter "opcode" has a time class associated with it; MD5 gets a high timing number, ISFILE gets a low timing number.

I would like to reorder this execution plan so that opcodes that take a long time are executed as rarely as possible. For the above plan, that would mean it would have to be:

Index  Success  Failure  Description
    0        S        1  File is file. (Not directory)
    1        S        2  File Matches C:\Windows\System32\*
    2        S        F  File MD5 Matches ABCDEFG

According to the "Dragon Book", picking the best order of execution for three address code is an NP-Complete problem (At least according to page 511 of the second edition of that text), but in that case they are talking about register allocation and other issues of the machine. In my case, the actual "intermediate code" is much simpler. I'm wondering of a scheme exists that would allow me to reorder the source IL into the optimal execution plan.

Here is another example:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }

Parsed to:

Index  Success  Failure  Description
    0        1        2  File Matches C:\Windows\Inf\*
    1        S        2  File is a Portable Executable
    2        3        F  File is file. (Not directory)
    3        F        S  File Matches C:\Windows\System32\Drivers\*

which is optimally:

Index  Success  Failure  Description
    0        1        2  File is file. (Not directory)
    1        2        S  File Matches C:\Windows\System32\Drivers\*
    2        3        F  File Matches C:\Windows\Inf\*
    3        S        F  File is a Portable Executable

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

思慕 2024-09-19 13:02:29

听起来在编译为操作码之前可能更容易选择最佳顺序。如果您有一个解析树,并且它尽可能“平坦”,那么您可以为每个节点分配一个分数,然后首先按最低总分数对每个节点的子节点进行排序。

例如:

{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
         1             2          3                 4

      OR
     /  \
  AND    AND
 /  \   /   \
1    2 3     4

您可以按最低分数对 AND 节点 (1, 2) 和 (3, 4) 进行排序,然后将该分数分配给每个节点。然后按其子节点的最低分数对 OR 节点的子节点进行排序。

由于 AND 和 OR 是可交换的,因此这种排序操作不会改变整个表达式的含义。

It sounds like it might be easier to pick an optimal order before compiling down to your opcodes. If you have a parse tree, and it is as "flat" as possible, then you can assign a score to each node and then sort each node's children by the lowest total score first.

For example:

{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
         1             2          3                 4

      OR
     /  \
  AND    AND
 /  \   /   \
1    2 3     4

You could sort the AND nodes (1, 2) and (3, 4) by the lowest score and then assign that score to each node. Then sort the children of the OR node by the lowest score of their children.

Since AND and OR are commutative, this sorting operation won't change the meaning of your overall expression.

墨小沫ゞ 2024-09-19 13:02:29

@Greg Hewgill 是对的,这在 AST 上比在中间代码上更容易执行。当您想要处理中间代码时,第一个目标是将其转换为依赖关系树(看起来像 AST /shrug)。

从叶子开始 - 如果您使用否定谓词来表示 NOT,这可能是最简单的。

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        3        F  File is file. (Not directory)
3        F        S  File Matches C:\Windows\System32\Drivers\*

提取叶(任何子节点均为 S、F 或提取节点的内容;在需要时插入 NOT;将所有对叶的引用替换为叶的父节点的引用)

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        L1        F  File is file. (Not directory)

L1=NOT(cost(child))
    |
Pred(cost(PATH))

提取节点(如果成功指向提取节点,则使用连词进行连接;失败使用析取;将所有对 Node 的引用替换为对包含 Node 的树的结果根的引用)。

Index  Success  Failure  Description
0        1        L3  File Matches C:\Windows\Inf\*
1        S        L3  File is a Portable Executable


L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

提取节点

Index  Success  Failure  Description
0        L5       L3  File Matches C:\Windows\Inf\*

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=IS(cost(child))
                    |                           | 
                    |                       1=Pred(cost(PORT_EXE))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

提取节点(如果成功和失败都引用节点,则必须通过对节点定义的子树的根进行模式匹配来将节点注入到树中)

  1. 如果根是 OR ,如果有必要,则反转谓词

  2. 如果根是 AND,则在必要时反转谓词以确保引用为 Failure,并作为析取与 Success 引用的子根进行注入。

导致:

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=AND(cost(as for L3))
                    |                             /               \
                    |                       L6=IS(cost(child))   L7=IS(cost(child))
                    |                           |                       |
                    |                       1=Pred(cost(PORT_EXE))   0=Pred(cost(PATH))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

@Greg Hewgill is right, this is easier to perform on the AST than on the Intermediate code. As you want to work on the Intermediate code, the first goal is to transform it into a dependency tree (which will look like the AST /shrug).

Start with the leaves - and it is probably easiest if you use negative-predicates for NOT.

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        3        F  File is file. (Not directory)
3        F        S  File Matches C:\Windows\System32\Drivers\*

Extract Leaf (anything with both children as S, F, or an extracted Node; insert NOT where required; Replace all references to Leaf with reference to parent node of leaf)

Index  Success  Failure  Description
0        1        2  File Matches C:\Windows\Inf\*
1        S        2  File is a Portable Executable
2        L1        F  File is file. (Not directory)

L1=NOT(cost(child))
    |
Pred(cost(PATH))

Extract Node (If Success points to Extracted Node use conjunction to join; Failure uses disjunction; Replace all references to Node with reference to resulting root of tree containing Node).

Index  Success  Failure  Description
0        1        L3  File Matches C:\Windows\Inf\*
1        S        L3  File is a Portable Executable


L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

Extract Node

Index  Success  Failure  Description
0        L5       L3  File Matches C:\Windows\Inf\*

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=IS(cost(child))
                    |                           | 
                    |                       1=Pred(cost(PORT_EXE))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))

Extract Node (In the case where Success and Failure both refer to Nodes, you will have to inject the Node into the tree by pattern matching on the root of the sub-tree defined by the Node)

  1. If root is OR, invert predicate if necessary to ensure reference is Success and inject as conjunction with child not referenced by Failure.

  2. If root is AND, invert predicate if necessary to ensure reference is Failure and inject as disjunction with child root referenced by Success.

Resulting in:

L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
                    /                          \
                    |                       L4=AND(cost(as for L3))
                    |                             /               \
                    |                       L6=IS(cost(child))   L7=IS(cost(child))
                    |                           |                       |
                    |                       1=Pred(cost(PORT_EXE))   0=Pred(cost(PATH))
                    |
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
               /           \
L1=NOT(cost(child))     L2=IS(cost(child))
    |                       |
3=Pred(cost(PATH))      2=Pred(cost(ISFILE))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文