我们是否应该避免 C++ 中的重复代码？为了变得“Pythonic”，怎么做？

发布于 2024-11-01 22:37:58 字数 3379 浏览 12 评论 0原文

我正处于Python的幼虫阶段和C++的预卵阶段，但我正在尽力做到最好，特别是遵循“不要重复自己”的原则。

我有一个多通道原始文件格式要打开，带有一个主 ascii 标头，其中的字段可表示为字符串和整数（始终编码为用空格填充的字符）。第二部分是 N 个标头，其中 N 是主标头的一个字段，每个标头本身都有更多的文本和数字字段（编码为 ascii），指的是实际 16 位多通道流的长度和大小组成文件的其余部分。

到目前为止，我已经有了 C++ 中的工作代码：

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

struct Header {
    string version;
    string patinfo;
    string recinfo;
    string start_date;
    string start_time;
    int header_bytes;
    string reserved;
    int nrecs;
    double rec_duration;
    int nchannels;
};

struct Channel {
    string label;
    string transducertype;
    string phys_dim;
    int pmin;
    int pmax;
    int dmin;
    int dmax;
    string prefiltering;
    int n_samples;
    string reserved;
};


int main()
{
    ifstream edf("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);

    // prepare to read file header
    Header header;
    char buffer[80];

    // reads header fields into the struct 'header'
    edf.read(buffer, 8);
    header.version = string(buffer, 8);

    edf.read(buffer, 80);
    header.patinfo = string(buffer, 80);

    edf.read(buffer, 80);
    header.recinfo = string(buffer, 80);

    edf.read(buffer, 8);
    header.start_date = string(buffer, 8);

    edf.read(buffer, 8);
    header.start_time = string(buffer, 8);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.header_bytes;

    edf.read(buffer, 44);
    header.reserved = string(buffer, 44);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.nrecs;

    edf.read(buffer,8);
    stringstream(buffer) >> header.rec_duration;

    edf.read(buffer,4);
    stringstream(buffer) >> header.nchannels;

    /*
    cout << "'" << header.version << "'" << endl;
    cout << "'" << header.patinfo << "'" << endl;
    cout << "'" << header.recinfo << "'" << endl;
    cout << "'" << header.start_date << "'" << endl;
    cout << "'" << header.start_time << "'" << endl;
    cout << "'" << header.header_bytes << "'" << endl;
    cout << "'" << header.reserved << "'" << endl;
    cout << "'" << header.nrecs << "'" << endl;
    cout << "'" << header.rec_duration << "'" << endl;
    cout << "'" << header.nchannels << "'" << endl;
    */

    // prepare to read channel headers
    int ns = header.nchannels; // ns tells how much channels I have
    char title[16]; // 16 is the specified length of the "label" field of each channel

    for (int n = 0; n < ns; n++)
    {
        edf >> title;
        cout << title << endl; // and this successfully echoes the label of each channel
    }


    return 0;
};

我已经必须做出一些评论：

我选择使用 struct 因为格式规范是非常硬编码的；
我没有迭代主头字段，因为要读取的字节数和类型在我看来相当任意；
现在我成功获得了每个通道的标签，我实际上会为每个通道的字段创建结构，这些字段本身可能必须存储在映射中。

我的（希望是简单的）问题是：

“我是否应该担心走捷径以使这种代码更加‘Pythonic’（更抽象，更少重复），或者这不是 C++ 中的工作方式？”

许多 Python 布道者（就像我自己一样，因为我喜欢它）强调它的易用性等等。所以，有一段时间我会想，我是否在做愚蠢的事情，或者只是在做正确的事情，但由于 C++ 的本质，并不是那么“自动”。

感谢您阅读

赫尔顿

原文

I am in larval stage with Python and pre-egg stage in C++, but i am trying to do my best, specially with the "Don't Repeat Yourself" principle.

I have a multichannel raw file-format to open, with a main ascii header with fields representable as strings and integers (always coded as chars padded with white spaces). The second part is N headers, with N being a field of the main header, and each of those headers has itself a lot more of text and number fields (coded as ascii) refering to the length and size of the actual 16 bit multichannel streams that compose the rest of the file.

So far, I have this working code in C++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

struct Header {
    string version;
    string patinfo;
    string recinfo;
    string start_date;
    string start_time;
    int header_bytes;
    string reserved;
    int nrecs;
    double rec_duration;
    int nchannels;
};

struct Channel {
    string label;
    string transducertype;
    string phys_dim;
    int pmin;
    int pmax;
    int dmin;
    int dmax;
    string prefiltering;
    int n_samples;
    string reserved;
};


int main()
{
    ifstream edf("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);

    // prepare to read file header
    Header header;
    char buffer[80];

    // reads header fields into the struct 'header'
    edf.read(buffer, 8);
    header.version = string(buffer, 8);

    edf.read(buffer, 80);
    header.patinfo = string(buffer, 80);

    edf.read(buffer, 80);
    header.recinfo = string(buffer, 80);

    edf.read(buffer, 8);
    header.start_date = string(buffer, 8);

    edf.read(buffer, 8);
    header.start_time = string(buffer, 8);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.header_bytes;

    edf.read(buffer, 44);
    header.reserved = string(buffer, 44);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.nrecs;

    edf.read(buffer,8);
    stringstream(buffer) >> header.rec_duration;

    edf.read(buffer,4);
    stringstream(buffer) >> header.nchannels;

    /*
    cout << "'" << header.version << "'" << endl;
    cout << "'" << header.patinfo << "'" << endl;
    cout << "'" << header.recinfo << "'" << endl;
    cout << "'" << header.start_date << "'" << endl;
    cout << "'" << header.start_time << "'" << endl;
    cout << "'" << header.header_bytes << "'" << endl;
    cout << "'" << header.reserved << "'" << endl;
    cout << "'" << header.nrecs << "'" << endl;
    cout << "'" << header.rec_duration << "'" << endl;
    cout << "'" << header.nchannels << "'" << endl;
    */

    // prepare to read channel headers
    int ns = header.nchannels; // ns tells how much channels I have
    char title[16]; // 16 is the specified length of the "label" field of each channel

    for (int n = 0; n < ns; n++)
    {
        edf >> title;
        cout << title << endl; // and this successfully echoes the label of each channel
    }


    return 0;
};

Some remarks I already have to make:

I opted to use struct because the format specification is very hardcoded;
I didn't iterate over the main header fields because the number of bytes and types to read seemed to me rather arbitrary;
Now that I successfully got each channel's label, I would actually create structs for each channel's fields, which by themselves would have to be stored perhaps in a map.

My (hopefully straightforward) question is:

"Should I worry about cutting corners to make this kind of code more 'Pythonic' (more abstract, less repetitive), or this is not the way things work in C++?"

Many Python evangelists (as I would be myself, because I love it) highlight its easyness to use and all that. So, I will wonder for some time if I am doing dumb things or only doing things right, but not so "automagical" because of the very nature of C++.

Thanks for reading

Helton

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋风の叶未落 2024-11-08 22:37:58

我想说的是，不存在 Pythonic C++ 代码这样的东西。 DRY 原则适用于两种语言，但大部分被认为是“Pythonic”的内容只是使用 Python 特定的构造在 Python 中表达逻辑的最短、最甜蜜的方式。惯用的 C++ 则完全不同。

例如，lambda 有时不被认为非常 Pythonic，并且保留用于不存在其他解决方案的情况，但只是添加到 C++ 标准中。 C++ 没有关键字参数，这非常 Pythonic。 C++ 程序员不喜欢在不必要的时候构建 map，而 Python 程序员可能会在很多问题上抛出 dict，而这些问题恰好使意图比实际情况更清晰。有效的替代方案。

如果您想节省输入，请使用我之前发布的函数< /a>，那么：

header.version = read_field(edf, 8);
header.patinfo = read_field(edf, 80);

这应该可以节省你相当多的行。但比这几行更重要的是，您已经实现了少量的模块化：如何读取字段和读取内容字段现在是程序的独立部分。

I'd say there's no such thing as Pythonic C++ code. The DRY principle applies in both languages, but much of what is considered "Pythonic" is simply the shortest, sweetest way of expressing logic in Python, using Python-specific constructs. Idiomatic C++ is quite different.

lambda, for example, is sometimes not considered very Pythonic and reserved for cases where no other solution exists, but is just being added to the C++ standard. C++ has no keyword arguments, which are very Pythonic. C++ programmers don't like constructing a map when not necessary, while a Python programmer might throw dict at a lot of problems where they just happen to make the intention clearer than the efficient alternative.

If you want to save typing, use the function I posted earlier, then:

header.version = read_field(edf, 8);
header.patinfo = read_field(edf, 80);

That should save you quite a few lines. But more important than those few lines is that you've achieved a small amount of modularity: how to read a field and what fields to read are now separate parts of your program.

回复收藏 0 原文

剪不断理还乱 2024-11-08 22:37:58

你是对的：正如所写，代码是重复的（并且没有错误检查）。您读取的每个字段实际上需要您执行三到五个步骤，具体取决于所读取的数据类型：

从流中读取字段
确保读取成功
解析数据（如果需要）
确保解析成功（如果需要）
复制将数据放入目标位置

您可以将所有这三个包装到一个函数中，以便减少代码的重复性。例如，考虑以下函数模板：

template <typename TStream, typename TResult>
void ReadFixedWidthFieldFromStream(TStream& str, TResult& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    std::stringstream ss(&data[0]);
    if (!(ss >> result))
        throw std::runtime_error("Failed to parse data from stream");
}

// Overload for std::string:
template <typename TStream>
void ReadFixedWidthFieldFromStream(TStream& str, std::string& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    result = std::string(&data[0], sz);
}

现在您的代码可以更加简洁：

ReadFixedWidthFieldFromStream(edf, header.version, 8);
ReadFixedWidthFieldFromStream(edf, header.patinfo, 80);
ReadFixedWidthFieldFromStream(edf, header.recinfo, 80);
// etc.

You are correct: as written, the code is repetitive (and has no error checking). Each field that you read really requires you to take three or five steps, depending on the type of data being read:

Read the field from the stream
Ensure the read succeeded
Parse the data (if necessary)
Ensure the parse succeeded (if necessary)
Copy the data into the target location

You can wrap all three of these up into a function so that the code is less repetitive. For example, consider the following function templates:

template <typename TStream, typename TResult>
void ReadFixedWidthFieldFromStream(TStream& str, TResult& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    std::stringstream ss(&data[0]);
    if (!(ss >> result))
        throw std::runtime_error("Failed to parse data from stream");
}

// Overload for std::string:
template <typename TStream>
void ReadFixedWidthFieldFromStream(TStream& str, std::string& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    result = std::string(&data[0], sz);
}

Now your code can be much more succinct:

ReadFixedWidthFieldFromStream(edf, header.version, 8);
ReadFixedWidthFieldFromStream(edf, header.patinfo, 80);
ReadFixedWidthFieldFromStream(edf, header.recinfo, 80);
// etc.

回复收藏 0 原文

剪不断理还乱 2024-11-08 22:37:58

这段代码简单、直观、易于理解。如果它有效，就不要浪费时间去改变它。我确信有很多写得不好、复杂且难以理解（并且可能不正确）的代码应该首先修复:)

回复收藏 0 原文

べ映画 2024-11-08 22:37:58

Python 之禅并没有明确提到 DRY。

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The Zen of Python doesn't mention DRY explicitly.

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

回复收藏 0 原文

长不大的小祸害 2024-11-08 22:37:58

要直接从字符串中的文件读取，请参阅此问题其余的是错了。 ~~但我个人认为有更好/更干净的方法来做到这一点。~~

如果您知道结构的大小，请不要使用字符串，请使用原始 C 类型（并确保结构已打包）。请参阅以下链接：http://msdn.microsoft .com/en-us/library/2e70t5y1(v=vs.80).aspx & http://gcc.gnu.org/onlinedocs/ gcc-3.2.3/gcc/Type-Attributes.html
例如，我会这样做（不确定每个字符串的大小，但你明白了）：
struct Header { char version[8]; char patinfo[80]; char recinfo[80]; char start_date[8]; char start_time[8]; int header_bytes; char reserved[44]; int nrecs; double rec_duration; int nchannels; };
一旦你有了一个打包的结构，你就可以阅读直接从文件中获取：
struct Header h; edf.read(&h,sizeof(struct Header));
对我来说，这是最干净的方法，但请记住，您必须打包结构，以便保证内存中的结构与结构具有相同的大小保存在文件中 - 这在测试时并不难看到。

<罢工>

For reading from file directly in strings see this question The rest is wrong. ~~but personally I think there's a better/cleaner way of doing this.~~

If you know the size of the structure don't use string, use primitive C types (and make sure the structure is packed). See these links: http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.80).aspx & http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html
I would do it this way for example (not sure about the size of each string but you get the idea):
struct Header { char version[8]; char patinfo[80]; char recinfo[80]; char start_date[8]; char start_time[8]; int header_bytes; char reserved[44]; int nrecs; double rec_duration; int nchannels; };
Once you have a packed structure you can read it directly from the file:
struct Header h; edf.read(&h,sizeof(struct Header));
For me this is the cleanest way to do it, but remember you must have your structure packed so that you have the guarantee that the structure in memory has the same size as the structure saved in the file - this is not very hard to see while testing.

回复收藏 0 原文

~没有更多了~