错误 - float'对象没有属性' encode'将DF存储为无效的DF到Cassandra数据库
我正在使用pd.read_csv读取pandas df的CSV文件。该DF中有一些缺少的值。我正在尝试将此DF存储在卡桑德拉(Cassandra)并获取错误 - “ float'对象没有属性'encode'。以下是我的代码 - :
def cassandraDBLoad(config_path):
try:
config = read_params(config_path)
cassandra_config = {'secure_connect_bundle': "filepath"}
auth_provider = PlainTextAuthProvider(
"client_id",
"client_secret"
)
cluster = Cluster(cloud=cassandra_config, auth_provider=auth_provider)
session = cluster.connect()
session.default_timeout = 120
connect_db = session.execute("select release_version from system.local")
set_keyspace = session.set_keyspace("keyspace_name")
table_ = "big_mart"
define_columns = "Item_Identifier varchar PRIMARY KEY, Item_Weight varchar, Item_Fat_Content varchar, Item_Visibility varchar, Item_Type varchar, Item_MRP varchar, Outlet_Identifier varchar, Outlet_Establishment_Year varchar, Outlet_Size varchar, Outlet_Location_type varchar, Outlet_Type varchar, Item_Outlet_Sales varchar, source varchar"
drop_table = f"DROP TABLE IF EXISTS {table_}"
drop_result = session.execute(drop_table)
create_table = f"CREATE TABLE {table_}({define_columns});"
table_result = session.execute(create_table)
train = pd.read_csv("train_source")
test = pd.read_csv("test_source")
#Combine test and train into one file
train['source']='train'
test['source']='test'
df = pd.concat([train, test],ignore_index=True)
columns = "Item_Identifier, Item_Weight, Item_Fat_Content, Item_Visibility, Item_Type, Item_MRP, Outlet_Identifier, Outlet_Establishment_Year, Outlet_Size, Outlet_Location_Type, Outlet_Type, Item_Outlet_Sales, source"
insert_qry = f"INSERT INTO {table_}({columns}) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)"
prepared = session.prepare(insert_qry)
batch = BatchStatement()
for i in range(len(df)):
batch.add(
prepared,
(df.iat[i,0], df.iat[i,1], df.iat[i,2], df.iat[i,3], df.iat[i,4], df.iat[i,5], df.iat[i,6], df.iat[i,7], df.iat[i,8], df.iat[i,9], df.iat[i,10], df.iat[i,11], df.iat[i,12])
)
session.execute(batch)
except Exception as e:
raise Exception("(cassandraDBLoad): Something went wrong in the CassandraDB Load operations\n" + str(e))
我尝试将以下内容添加为pd._csv中的参数
encoding_error =“ ignore”
我还尝试通过执行str(df.iat [i,1])将DF值(尤其是浮点数)转换为字符串。 以上两个步骤都无法解决错误。
CSV文件链接 - https://drive.google.com/drive/folders/1O03Lntmfswhukg61zos7fnxxire444grp?usp = sharing
I am reading a csv file using pd.read_csv into a pandas df. This df has some missing values in it. I am trying to store this df in cassandra and getting the error - "'float' object has no attribute 'encode'. Following is my code-:
def cassandraDBLoad(config_path):
try:
config = read_params(config_path)
cassandra_config = {'secure_connect_bundle': "filepath"}
auth_provider = PlainTextAuthProvider(
"client_id",
"client_secret"
)
cluster = Cluster(cloud=cassandra_config, auth_provider=auth_provider)
session = cluster.connect()
session.default_timeout = 120
connect_db = session.execute("select release_version from system.local")
set_keyspace = session.set_keyspace("keyspace_name")
table_ = "big_mart"
define_columns = "Item_Identifier varchar PRIMARY KEY, Item_Weight varchar, Item_Fat_Content varchar, Item_Visibility varchar, Item_Type varchar, Item_MRP varchar, Outlet_Identifier varchar, Outlet_Establishment_Year varchar, Outlet_Size varchar, Outlet_Location_type varchar, Outlet_Type varchar, Item_Outlet_Sales varchar, source varchar"
drop_table = f"DROP TABLE IF EXISTS {table_}"
drop_result = session.execute(drop_table)
create_table = f"CREATE TABLE {table_}({define_columns});"
table_result = session.execute(create_table)
train = pd.read_csv("train_source")
test = pd.read_csv("test_source")
#Combine test and train into one file
train['source']='train'
test['source']='test'
df = pd.concat([train, test],ignore_index=True)
columns = "Item_Identifier, Item_Weight, Item_Fat_Content, Item_Visibility, Item_Type, Item_MRP, Outlet_Identifier, Outlet_Establishment_Year, Outlet_Size, Outlet_Location_Type, Outlet_Type, Item_Outlet_Sales, source"
insert_qry = f"INSERT INTO {table_}({columns}) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)"
prepared = session.prepare(insert_qry)
batch = BatchStatement()
for i in range(len(df)):
batch.add(
prepared,
(df.iat[i,0], df.iat[i,1], df.iat[i,2], df.iat[i,3], df.iat[i,4], df.iat[i,5], df.iat[i,6], df.iat[i,7], df.iat[i,8], df.iat[i,9], df.iat[i,10], df.iat[i,11], df.iat[i,12])
)
session.execute(batch)
except Exception as e:
raise Exception("(cassandraDBLoad): Something went wrong in the CassandraDB Load operations\n" + str(e))
I have tried adding the following as the parameter in pd.read_csv
encoding_error = "ignore"
I have also tried to convert the df values(especially the float ones) to string by doing str(df.iat[i,1]).
Both the above steps didn't resolve the error.
csv files link - https://drive.google.com/drive/folders/1O03lNTMfSwhUKG61zOs7fNxXIRe44GRp?usp=sharing
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论