从sftp服务器读取镶木木datei到r

发布于 2025-02-11 21:09:22 字数 981 浏览 1 评论 0原文

我正在尝试从SFTP服务器读取镶木quet文件。或者更确切地说，我试图在不同的文件夹的不同时间读取其中的许多内容，因此我想自动化它^^。

经过大量谷歌搜索（并在这里阅读）。我能够通过RCRUL连接到SFTP服务器，甚至可以查看我想通过脚本完全自动化正确的路径的文件夹。唯一缺少的部分是加载相关文件本身。然而。一旦我尝试使用getUrl，我会收到以下错误：

Error in curlPerform(curl=curl, .opts=opts, .encoding= .encoding) : 
embedded nul in string: 'PAR1\025

为确保问题不在文件中，我将其下载并使用箭头read_parquet（）函数打开它，该功能正常。

rcurl函数geturlConntent（）会产生相同的错误。

getBinaryurl（）有效，我得到了“某物”。问题是我现在得到了一个维度[1：5068598]的列表，而不是一个具有31列和46899行的数组。

此外，该列表不仅包含0和1，还包含2个数字（例如50）或数字和字母（例如9C）。

我的意思是，我知道这是因为我在二进制中将文件信息阅读到r中，但我完全不确定如何解决此问题，或者为什么不仅仅是0和1。

从我尝试Google的问题，NUL字符串错误只能通过更改文件本身来解决。由于我必须使用数千个文件，因此我不打算（或不能）手动更改每个文件以解决问题。

只需将每个文件从SFTP下载到我的PC即可工作（Filezilla）似乎也忽略了问题而不是解决问题。

我也不知道如何将二进制代码更改为可读的内容。这里的主要问题是文件的31列，有些是char，一些int，一些num等。因此，我如何将二进制数据转换为一个版本，如果我将其加载到使用箭头中， read_parquet（）函数？

我已经整天都在寻找解决方案，到了这个时候，我觉得自己盘旋而不是其他任何东西，因此将不胜感激。

原文

I am trying to read a Parquet file from an sftp server. Or rather I'm trying to read a lot of them at different times from different folders, so I want to automate it^^.

After a lot of googling (and reading around here). I was able to connect to the sftp server over RCrul and even get to check out the folders I want to get the proper path completely automated by my script. The only part missing is to load the file in question itself.
However. As soon as I try to use getURL I get the following error:

Error in curlPerform(curl=curl, .opts=opts, .encoding= .encoding) : 
embedded nul in string: 'PAR1\025

To be sure the problem is not with the file, I manually downloaded it and opened it with arrows read_parquet() function, which worked fine.

The RCurl function getURLConntent() creates the same error.

getBinaryURL(), however, works and I get "something". The problem is that I now get a list of dimension [1:5068598] instead of an array with 31 columns and a length of 46899 rows.

Furthermore, the list does not only contain 0's and 1's but 2 numbers (like 50) or numbers and letters (like 9c).

I mean, I'm aware that this is because I read the file information into R in binary but I'm totally unsure about how to fix this, or why it isn't just 0's and 1's.

From my attempt to google the problem, the nul string error could only be fixed by changing the file itself. Since I have to work with a couple of thousands of these files, I don't intend (or cannot) change each file manually to fix the problem.

Just downloading each file away from the sftp to my PC for working (over Filezilla) also seems to just ignore the problem instead of fixing it.

I also don't know how I should change the binary code to something readable. The main problem here is that of the 31 columns of the file, some are chars, some int, some num, etc. So how can I retransform the binary data into a version it would be if I load it in using arrows read_parquet() function?

I'm looking for a solution all day already and by this time I feel like I'm rather running in circles than anything else, so help would be appreciated.

分享到QQ

分享到微博