dataprep.eda typeError:如果您指定块块,请以int或无用的方式提供npartitions
努力了解DataPrep软件包中出现的这种类型。我的设置非常简单,如下所示:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"phone": [
"555-234-5678",
"(555) 234-5678",
"555.234.5678",
"555/234/5678",
15551234567,
"(1) 555-234-5678",
"+1 (234) 567-8901 x. 1234",
"2345678901 extension 1234",
"2345678",
"800-299-JUNK",
"1-866-4ZIPCAR",
"123 ABC COMPANY",
"+66 91 889 8948",
"hello",
np.nan,
"NULL",
]
}
)
from dataprep.clean import clean_phone
clean_phone(df, "phone")
结果错误消息被抛入终端(我省略了文件路径并用 x 用于安全目的而替换敏感值):
Traceback (most recent call last):
File "c:\Users\x\x\Documents\Repositories\test.py", line 14, in <module>
clean_phone(df, "phone")
File "C:\Users\x\Anaconda3\envs\myenv\lib\site-packages\dataprep\clean\clean_phone.py", line 150, in clean_phone
df = to_dask(df)
File "C:\Users\x\Anaconda3\envs\myenv\lib\site-packages\dataprep\clean\utils.py", line 73, in to_dask
return dd.from_pandas(df, npartitions=npartitions)
File "C:\Users\x\Anaconda3\envs\myenv\lib\site-packages\dask\dataframe\io\io.py", line 236, in from_pandas
raise TypeError(
TypeError: Please provide npartitions as an int, or possibly as None if you specify chunksize.
这是重复复制的直接尝试DataPrep软件包团队显示的教程: /user_guide/clean/clean_phone.html
根据教程,预期输出为以下:
将其发布为TypeError时仅显示谷歌搜索时仅显示一个半相关结果。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
dataprep
软件包中有一个小错误,您可以在这个pr 。同时,避免该错误的一个选项是将数据明确转换为
dask
dataframe并将其传递到该函数:There is a small bug in
dataprep
package, you can track it in this PR.In the meantime, one option to avoid the bug is to explicitly convert data to a
dask
dataframe and pass that into the function: