如何编写灵活的模块化程序,模块之间具有良好的交互可能性?

发布于 2024-09-03 10:52:18 字数 617 浏览 9 评论 0原文

我在这里浏览了类似主题的答案,但找不到令人满意的答案。因为我知道这是一个相当大的主题,所以我会尝试更具体。

我想编写一个处理文件的程序。处理过程并不简单,因此最好的方法是将不同的阶段分成独立的模块,然后根据需要使用这些模块(因为有时我只对模块 A 的输出感兴趣,有时我需要其他五个模块的输出等) )。问题是,我需要模块进行合作,因为一个模块的输出可能是另一个模块的输入。我需要它很快。此外,我想避免多次执行某些处理(如果模块 A 创建了一些数据,然后需要由模块 B 和 C 处理,我不想运行模块 A 两次来为模块 B,C 创建输入) 。

模块需要共享的信息主要是二进制数据块和/或已处理文件的偏移量。主程序的任务非常简单 - 只需解析参数,运行所需的模块(也许会给出一些输出,或者这应该是模块的任务?)。

我不需要在运行时加载模块。拥有带有 .h 文件的库并在每次有新模块或更新某些模块时重新编译程序是完全可以的。模块的想法在这里主要是因为代码的可读性、可维护性以及能够让更多的人在不同的模块上工作,而不需要一些预定义的接口或其他什么(另一方面,一些关于如何编写模块的“指南”)可能需要模块,我知道)。我们可以假设文件处理是只读操作,原始文件没有改变。

有人能给我指出如何在 C++ 中做到这一点的好方向吗?欢迎任何建议(链接、教程、pdf 书籍...)。

I went through answers on similar topics here on SO but could't find a satisfying answer. Since i know this is a rather large topic, i will try to be more specific.

I want to write a program which processes files. The processing is nontrivial, so the best way is to split different phases into standalone modules which then would be used as necessary (since sometimes i will be only interested in the output of module A, sometimes i would need output of five other modules, etc). The thing is, that i need the modules to cooperate, because the output of one might be the input of another. And i need it to be FAST. Moreover i want to avoid doing certain processing more than once (if module A creates some data which then need to be processed by module B and C, i don't want to run module A twice to create the input for modules B,C ).

The information the modules need to share would mostly be blocks of binary data and/or offsets into the processed files. The task of the main program would be quite simple - just parse arguments, run required modules (and perhaps give some output, or should this be the task of the modules?).

I don't need the modules to be loaded at runtime. It's perfectly fine to have libs with a .h file and recompile the program every time there is a new module or some module is updated. The idea of modules is here mainly because of code readability, maintaining and to be able to have more people working on different modules without the need to have some predefined interface or whatever (on the other hand, some "guidelines" on how to write the modules would be probably required, i know that). We can assume that the file processing is a read-only operation, the original file is not changed.

Could someone point me in a good direction on how to do this in C++ ? Any advice is wellcome (links, tutorials, pdf books...).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

若水微香 2024-09-10 10:52:18

这看起来与插件架构非常相似。我建议从(非正式)数据流程图开始,以确定:

  • 这些块如何处理数据
  • 需要传输哪些数据
  • 从一个块返回到另一个块的结果(数据/错误代码/异常)

有了这些信息,您可以开始构建通用接口,允许在运行时绑定到其他接口。然后我会向每个模块添加一个工厂函数,以从中请求真正的处理对象。我建议直接从模块接口获取处理对象,而是返回一个工厂对象,可以在其中检索处理对象。然后,这些处理对象用于构建整个处理链。

一个过于简化的大纲看起来像这样:

struct Processor
{
    void doSomething(Data);
};

struct Module
{
    string name();
    Processor* getProcessor(WhichDoIWant);
    deleteprocessor(Processor*);
};

在我看来,这些模式可能会出现:

  • 工厂函数:从模块中获取对象
  • 复合 &&装饰器:形成处理链

This looks very similar to a plugin architecture. I recommend to start with a (informal) data flow chart to identify:

  • how these blocks process data
  • what data needs to be transferred
  • what results come back from one block to another (data/error codes/ exceptions)

With these Information you can start to build generic interfaces, which allow to bind to other interfaces at runtime. Then I would add a factory function to each module to request the real processing object out of it. I don't recommend to get the processing objects direct out of the module interface, but to return a factory object, where the processing objects ca be retrieved. These processing objects then are used to build the entire processing chain.

A oversimplified outline would look like this:

struct Processor
{
    void doSomething(Data);
};

struct Module
{
    string name();
    Processor* getProcessor(WhichDoIWant);
    deleteprocessor(Processor*);
};

Out of my mind these patterns are likely to appear:

  • factory function: to get objects from modules
  • composite && decorator: forming the processing chain
檐上三寸雪 2024-09-10 10:52:18

我想知道 C++ 是否是为此目的考虑的正确级别。根据我的经验,在 UNIX 哲学中,将单独的程序通过管道连接在一起始终被证明是有用的。

如果你的数据不是太大,拆分有很多优点。您首先获得独立测试处理的每个阶段的能力,运行一个程序并将输出重定向到文件:您可以轻松检查结果。然后,即使每个程序都是单线程的,您也可以利用多核系统,从而更容易创建和调试。您还可以利用程序之间的管道来实现操作系统同步。也许您的某些程序也可以使用现有的实用程序来完成?

您的最终程序将创建粘合剂,将所有实用程序收集到一个程序中,将数据从一个程序传输到另一个程序(此时不再有文件),并根据所有计算的需要复制它。

I am wondering if the C++ is the right level to think about for this purpose. In my experience, it has always proven useful to have separate programs that are piped together, in the UNIX philosophy.

If your data is no overly large, there are many advantages in splitting. You first gain the ability to test every phase of your processing independently, you run one program an redirect the output to a file: you can easily check the result. Then, you take advantage of multiple core systems even if each of your programs is single threaded, and thus much easier to create and debug. And you also take advantage of the operating system synchronization using the pipes between your programs. Maybe also some of your programs could be done using already existing utility programs?

Your final program will create the glue to gather all of your utilities into a single program, piping data from a program to another (no more files at this times), and replicating it as required for all your computations.

望喜 2024-09-10 10:52:18

这看起来确实很微不足道,所以我想我们错过了一些要求。

使用 Memoization 避免多次计算结果。这应该在框架中完成。

您可以使用一些流程图来确定如何使信息从一个模块传递到另一个模块......但最简单的方法是让每个模块直接调用它们所依赖的模块。通过记忆,它不会花费太多,因为如果它已经被计算过,那就没问题了。

由于您需要能够启动任何模块,因此您需要为它们提供 ID 并在某个地方注册它们,以便在运行时查找它们。有两种方法可以做到这一点。

  • Exemplar:您获得此类模块的唯一示例并执行它。
  • 工厂:您创建所需类型的模块,执行它并丢弃它。

Exemplar 方法的缺点是,如果您执行该模块两次,您将不会从干净的状态开始,而是从最后一次(可能失败)执行留下的状态开始。它可能被视为一个优势,但如果失败,则不会计算结果(呃),所以我建议不要这样做。

那么你如何...?

让我们从工厂开始吧。

class Module;
class Result;

class Organizer
{
public:
  void AddModule(std::string id, const Module& module);
  void RemoveModule(const std::string& id);

  const Result* GetResult(const std::string& id) const;

private:
  typedef std::map< std::string, std::shared_ptr<const Module> > ModulesType;
  typedef std::map< std::string, std::shared_ptr<const Result> > ResultsType;

  ModulesType mModules;
  mutable ResultsType mResults; // Memoization
};

这确实是一个非常基本的界面。但是,由于每次调用 Organizer 时我们都需要一个新的模块实例(以避免重入问题),因此我们需要在 Module 接口上进行操作。

class Module
{
public:
  typedef std::auto_ptr<const Result> ResultPointer;

  virtual ~Module() {}               // it's a base class
  virtual Module* Clone() const = 0; // traditional cloning concept

  virtual ResultPointer Execute(const Organizer& organizer) = 0;
}; // class Module

现在,这很简单:

// Organizer implementation
const Result* Organizer::GetResult(const std::string& id)
{
  ResultsType::const_iterator res = mResults.find(id);

  // Memoized ?
  if (res != mResults.end()) return *(it->second);

  // Need to compute it
  // Look module up
  ModulesType::const_iterator mod = mModules.find(id);
  if (mod != mModules.end()) return 0;

  // Create a throw away clone
  std::auto_ptr<Module> module(it->second->Clone());

  // Compute
  std::shared_ptr<const Result> result(module->Execute(*this).release());
  if (!result.get()) return 0;

  // Store result as part of the Memoization thingy
  mResults[id] = result;

  return result.get();
}

还有一个简单的模块/结果示例:

struct FooResult: Result { FooResult(int r): mResult(r) {} int mResult; };

struct FooModule: Module
{
  virtual FooModule* Clone() const { return new FooModule(*this); }

  virtual ResultPointer Execute(const Organizer& organizer)
  {
    // check that the file has the correct format
    if(!organizer.GetResult("CheckModule")) return ResultPointer();

    return ResultPointer(new FooResult(42));
  }
};

并且来自 main:

#include "project/organizer.h"
#include "project/foo.h"
#include "project/bar.h"


int main(int argc, char* argv[])
{
  Organizer org;

  org.AddModule("FooModule", FooModule());
  org.AddModule("BarModule", BarModule());

  for (int i = 1; i < argc; ++i)
  {
    const Result* result = org.GetResult(argv[i]);
    if (result) result->print();
    else std::cout << "Error while playing: " << argv[i] << "\n";
  }
  return 0;
}

This really seems quite trivial, so I suppose we miss some requirements.

Use Memoization to avoid computing the result more than once. This should be done in the framework.

You could use some flowchart to determine how to make the information pass from one module to another... but the simplest way is to have each module directly calling those they depend upon. With memoization it does not cost much since if it's already been computed, you're fine.

Since you need to be able to launch about any module, you need to give them IDs and register them somewhere with a way to look them up at runtime. There are two ways to do this.

  • Exemplar: You get the unique exemplar of this kind of module and execute it.
  • Factory: You create a module of the kind requested, execute it and throw it away.

The downside of the Exemplar method is that if you execute the module twice, you'll not be starting from a clean state but from the state that the last (possibly failed) execution left it in. For memoization it might be seen as an advantage, but if it failed the result is not computed (urgh), so I would recommend against it.

So how do you ... ?

Let's begin with the factory.

class Module;
class Result;

class Organizer
{
public:
  void AddModule(std::string id, const Module& module);
  void RemoveModule(const std::string& id);

  const Result* GetResult(const std::string& id) const;

private:
  typedef std::map< std::string, std::shared_ptr<const Module> > ModulesType;
  typedef std::map< std::string, std::shared_ptr<const Result> > ResultsType;

  ModulesType mModules;
  mutable ResultsType mResults; // Memoization
};

It's a very basic interface really. However, since we want a new instance of the module each time we invoke the Organizer (to avoid problem of reentrance), we need will need to work on our Module interface.

class Module
{
public:
  typedef std::auto_ptr<const Result> ResultPointer;

  virtual ~Module() {}               // it's a base class
  virtual Module* Clone() const = 0; // traditional cloning concept

  virtual ResultPointer Execute(const Organizer& organizer) = 0;
}; // class Module

And now, it's easy:

// Organizer implementation
const Result* Organizer::GetResult(const std::string& id)
{
  ResultsType::const_iterator res = mResults.find(id);

  // Memoized ?
  if (res != mResults.end()) return *(it->second);

  // Need to compute it
  // Look module up
  ModulesType::const_iterator mod = mModules.find(id);
  if (mod != mModules.end()) return 0;

  // Create a throw away clone
  std::auto_ptr<Module> module(it->second->Clone());

  // Compute
  std::shared_ptr<const Result> result(module->Execute(*this).release());
  if (!result.get()) return 0;

  // Store result as part of the Memoization thingy
  mResults[id] = result;

  return result.get();
}

And a simple Module/Result example:

struct FooResult: Result { FooResult(int r): mResult(r) {} int mResult; };

struct FooModule: Module
{
  virtual FooModule* Clone() const { return new FooModule(*this); }

  virtual ResultPointer Execute(const Organizer& organizer)
  {
    // check that the file has the correct format
    if(!organizer.GetResult("CheckModule")) return ResultPointer();

    return ResultPointer(new FooResult(42));
  }
};

And from main:

#include "project/organizer.h"
#include "project/foo.h"
#include "project/bar.h"


int main(int argc, char* argv[])
{
  Organizer org;

  org.AddModule("FooModule", FooModule());
  org.AddModule("BarModule", BarModule());

  for (int i = 1; i < argc; ++i)
  {
    const Result* result = org.GetResult(argv[i]);
    if (result) result->print();
    else std::cout << "Error while playing: " << argv[i] << "\n";
  }
  return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文