用简单的CSV导入到Clickhouse,解析错误

发布于 2025-01-22 16:37:09 字数 1760 浏览 2 评论 0原文

我有一个非常简单的CSV,我试图进口到Clickhouse,但没有成功。创建表语句是:

CREATE TABLE staging.EloLBK
(
    `Month` DateTime64(3),
    `1958` Int32,
    `1959` Int32,
    `1960` Int32
)
ENGINE = MergeTree
PRIMARY KEY Month
ORDER BY Month
SETTINGS index_granularity = 8192

CSV数据如下:

"Month", "1958", "1959", "1960"
"JAN",  340,  360,  417
"FEB",  318,  342,  391
"MAR",  362,  406,  419
"APR",  348,  396,  461
"MAY",  363,  420,  472
"JUN",  435,  472,  535
"JUL",  491,  548,  622
"AUG",  505,  559,  606
"SEP",  404,  463,  508
"OCT",  359,  407,  461
"NOV",  310,  362,  390
"DEC",  337,  405,  432

我的导入语句是:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer')

从Clickhouse返回的错误是:

Code: 27. DB::Exception: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,':
Row 1:
Column 0,   name: Month, type: DateTime64(3), parsed text: "<DOUBLE QUOTE>JAN<DOUBLE QUOTE>,  340,  360,  "ERROR
Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 21.12.1.8928 (official build))

: While executing CSVRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

我不确定如何解决此问题,因此将不胜感激!

I have an extremely simple CSV I'm trying to import into clickhouse with no success. The create table statement is:

CREATE TABLE staging.EloLBK
(
    `Month` DateTime64(3),
    `1958` Int32,
    `1959` Int32,
    `1960` Int32
)
ENGINE = MergeTree
PRIMARY KEY Month
ORDER BY Month
SETTINGS index_granularity = 8192

The CSV data looks like this:

"Month", "1958", "1959", "1960"
"JAN",  340,  360,  417
"FEB",  318,  342,  391
"MAR",  362,  406,  419
"APR",  348,  396,  461
"MAY",  363,  420,  472
"JUN",  435,  472,  535
"JUL",  491,  548,  622
"AUG",  505,  559,  606
"SEP",  404,  463,  508
"OCT",  359,  407,  461
"NOV",  310,  362,  390
"DEC",  337,  405,  432

My import statement is:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer')

The error coming back from clickhouse is:

Code: 27. DB::Exception: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,':
Row 1:
Column 0,   name: Month, type: DateTime64(3), parsed text: "<DOUBLE QUOTE>JAN<DOUBLE QUOTE>,  340,  360,  "ERROR
Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 21.12.1.8928 (official build))

: While executing CSVRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

I'm not sure how to resolve this so any advice would be appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

阳光下的泡沫是彩色的 2025-01-29 16:37:09

您可以告诉Clickhouse尽力通过选项date_time_input_format ='Best_ffort',例如:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer') settings date_time_input_format='best_effort';

将导致:

┌───────────────────Month─┬─1958─┬─1959─┬─1960─┐
│ 2000-01-01 00:00:00.000 │  340 │  360 │  417 │
│ 2000-02-01 00:00:00.000 │  318 │  342 │  391 │
│ 2000-03-01 00:00:00.000 │  362 │  406 │  419 │
│ 2000-04-01 00:00:00.000 │  348 │  396 │  461 │
│ 2000-05-01 00:00:00.000 │  363 │  420 │  472 │
│ 2000-06-01 00:00:00.000 │  435 │  472 │  535 │
│ 2000-07-01 00:00:00.000 │  491 │  548 │  622 │
│ 2000-08-01 00:00:00.000 │  505 │  559 │  606 │
│ 2000-09-01 00:00:00.000 │  404 │  463 │  508 │
│ 2000-10-01 00:00:00.000 │  359 │  407 │  461 │
│ 2000-11-01 00:00:00.000 │  310 │  362 │  390 │
│ 2000-12-01 00:00:00.000 │  337 │  405 │  432 │
└─────────────────────────┴──────┴──────┴──────┘

You can tell ClickHouse to do a best effort guess by passing the option date_time_input_format='best_effort', for example:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer') settings date_time_input_format='best_effort';

Will lead to:

┌───────────────────Month─┬─1958─┬─1959─┬─1960─┐
│ 2000-01-01 00:00:00.000 │  340 │  360 │  417 │
│ 2000-02-01 00:00:00.000 │  318 │  342 │  391 │
│ 2000-03-01 00:00:00.000 │  362 │  406 │  419 │
│ 2000-04-01 00:00:00.000 │  348 │  396 │  461 │
│ 2000-05-01 00:00:00.000 │  363 │  420 │  472 │
│ 2000-06-01 00:00:00.000 │  435 │  472 │  535 │
│ 2000-07-01 00:00:00.000 │  491 │  548 │  622 │
│ 2000-08-01 00:00:00.000 │  505 │  559 │  606 │
│ 2000-09-01 00:00:00.000 │  404 │  463 │  508 │
│ 2000-10-01 00:00:00.000 │  359 │  407 │  461 │
│ 2000-11-01 00:00:00.000 │  310 │  362 │  390 │
│ 2000-12-01 00:00:00.000 │  337 │  405 │  432 │
└─────────────────────────┴──────┴──────┴──────┘
﹎☆浅夏丿初晴 2025-01-29 16:37:09

好的,经过更多的混乱,似乎错误消息只是有些误导。实际问题是,Clickhouse(可以理解)无法作为日期时间解析本月。

以下CSV输入效果很好:

"Month","1958","1959","1960"
"1970-01-01T00:00:00","340","360","417"
"1970-01-02T00:00:00","318","342","391"

Ok, after some more messing around, it appears the error message is just a bit misleading. The actual issue is that clickhouse is (understandably) unable to parse the month as a datetime.

The following CSV input works fine:

"Month","1958","1959","1960"
"1970-01-01T00:00:00","340","360","417"
"1970-01-02T00:00:00","318","342","391"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文