在 C++ 中设置 Apache Arrow 和 Parquet 库

发布于 2025-01-10 06:30:16 字数 990 浏览 0 评论 0 原文

我正在尝试做一些简单的事情:在 C++ 中以 Apache Parquet 格式保存数据。但是,我不知道如何将 Apache Arrow 库正确链接到我的项目,以便使用必要的 #include<".h"> 。标头。我已经几十年没有使用过 C++ 了,而且当时我的水平也不是很好,所以这超出了我的范围。

我使用了 NuGet(用于 pthreads 库),并使用 Windows 10 上 Visual Studio 2019 的项目属性(用于 Npcap 库)拥有物理链接的库,但遵循 Apache Arrow 说明 (https://arrow.apache.org/docs/developers/cpp/index.html)目前超出了我的范围。

到目前为止,我已经安装了 Git 和 CMake,并且可以让 CMake 将箭头 VS 项目文件放入我创建的 ./build 文件夹中,但我无法运行任何示例,也无法链接头文件。我尝试将 VS 项目添加到构建文件夹中的“箭头”解决方案中,但我从未编译过该代码。在之前的尝试中,我通过 VS 使用 CMake,并使用 Ninja 来尝试构建库。但在大多数情况下,我几乎是直接猜测这个过程的每一步。

我想到的一些问题是:我是否想为 CMake 指定“Visual Studio 16 2019”生成器,或者我应该做一些不同的事情?如何使用构建的文件?我是否需要修改箭头下载中包含的 CMakeLists.txt 文件?我是否需要为构建编写脚本,还是最好从命令行运行?

关于构建的可选组件;我很确定我应该有 -DARROW_PARQUET=ON 但 -DARROW_PLASMA=ON 怎么样?它与共享内存对象存储相关,为了保存为镶木地板文件,我需要将数据加载到内存中的箭头表中,那么这是否适用?那么其他 50 个左右的选项呢?

我提前为我在这个问题上的天真道歉,并感谢任何帮助或建议。谢谢。

I'm trying to do something simple: save data in Apache Parquet format in C++. However, I cannot figure out how to properly link the Apache Arrow library to my project in order to use the necessary #include<".h"> headers. I haven't used C++ in decades and I wasn't very good even back then, so this is out of my league.

I've used NuGet (for pthreads library) and have physically linked libraries using the Project Properties of Visual Studio 2019 on Windows 10 (for Npcap library), but following the Apache Arrow instructions (https://arrow.apache.org/docs/developers/cpp/index.html) is currently beyond me.

So far I've installed Git and CMake and I can get CMake to put arrow VS Project files into a ./build folder I've created, but I cannot run any example nor can I link the header files. I've tried adding a VS Project to the 'arrow' solution in the build folder, but I never got that code to compile. In previous attempts I've used CMake through VS as well as used Ninja to try and build the libraries. But for the most part, I am straight-up guessing on pretty much every step of this process.

Some questions that come to mind are: do I even want to specify the "Visual Studio 16 2019" generator to CMake or should I do something different? How do I use the files that were built? Do I need to modify the CMakeLists.txt files included in the arrow download? Do I need to write a script for the build or is it preferable to run from the command line?

Regarding the Optional Components of the build; I'm pretty sure I should have -DARROW_PARQUET=ON but what about -DARROW_PLASMA=ON? It's related to Shared Memory Object Store and in order to save as a parquet file I'll need to load my data into an arrow Table in the memory, so is this applicable? What about the other 50 or so options?

I apologize in advance for my naiveté on this subject and appreciate any help or advice. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文