错误 - float'对象没有属性' encode'将DF存储为无效的DF到Cassandra数据库

发布于 2025-02-13 02:23:08 字数 2885 浏览 0 评论 0原文

我正在使用pd.read_csv读取pandas df的CSV文件。该DF中有一些缺少的值。我正在尝试将此DF存储在卡桑德拉(Cassandra)并获取错误 - “ float'对象没有属性'encode'。以下是我的代码 - :

def cassandraDBLoad(config_path):
    try:
        config = read_params(config_path)
        cassandra_config = {'secure_connect_bundle': "filepath"}
        auth_provider = PlainTextAuthProvider(
                        "client_id",
                        "client_secret"
                        )
        cluster = Cluster(cloud=cassandra_config, auth_provider=auth_provider)
        session = cluster.connect()
        session.default_timeout = 120
        connect_db = session.execute("select release_version from system.local")
        set_keyspace = session.set_keyspace("keyspace_name")
    
        table_ = "big_mart"
        define_columns = "Item_Identifier varchar PRIMARY KEY, Item_Weight varchar, Item_Fat_Content varchar, Item_Visibility varchar,  Item_Type varchar, Item_MRP varchar, Outlet_Identifier varchar, Outlet_Establishment_Year varchar, Outlet_Size varchar, Outlet_Location_type varchar, Outlet_Type varchar, Item_Outlet_Sales varchar, source varchar"
        drop_table = f"DROP TABLE IF EXISTS {table_}"
        drop_result = session.execute(drop_table)
        create_table = f"CREATE TABLE {table_}({define_columns});"
        table_result = session.execute(create_table)
    
        train = pd.read_csv("train_source")
        test = pd.read_csv("test_source")
    
        #Combine test and train into one file
        train['source']='train'
        test['source']='test'
        df = pd.concat([train, test],ignore_index=True)

        columns = "Item_Identifier, Item_Weight, Item_Fat_Content, Item_Visibility, Item_Type, Item_MRP, Outlet_Identifier, Outlet_Establishment_Year, Outlet_Size, Outlet_Location_Type, Outlet_Type, Item_Outlet_Sales, source"
        insert_qry = f"INSERT INTO {table_}({columns}) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)"
        prepared = session.prepare(insert_qry)

        batch = BatchStatement()
        for i in range(len(df)):
            batch.add(
                prepared,
                (df.iat[i,0], df.iat[i,1], df.iat[i,2], df.iat[i,3], df.iat[i,4], df.iat[i,5], df.iat[i,6], df.iat[i,7], df.iat[i,8], df.iat[i,9], df.iat[i,10], df.iat[i,11], df.iat[i,12])
            )

        session.execute(batch)

    except Exception as e:
        raise Exception("(cassandraDBLoad): Something went wrong in the CassandraDB Load operations\n" + str(e))

我尝试将以下内容添加为pd._csv中的参数

encoding_error =“ ignore”

我还尝试通过执行str(df.iat [i,1])将DF值(尤其是浮点数)转换为字符串。 以上两个步骤都无法解决错误。

CSV文件链接 - https://drive.google.com/drive/folders/1O03Lntmfswhukg61zos7fnxxire444grp?usp = sharing

I am reading a csv file using pd.read_csv into a pandas df. This df has some missing values in it. I am trying to store this df in cassandra and getting the error - "'float' object has no attribute 'encode'. Following is my code-:

def cassandraDBLoad(config_path):
    try:
        config = read_params(config_path)
        cassandra_config = {'secure_connect_bundle': "filepath"}
        auth_provider = PlainTextAuthProvider(
                        "client_id",
                        "client_secret"
                        )
        cluster = Cluster(cloud=cassandra_config, auth_provider=auth_provider)
        session = cluster.connect()
        session.default_timeout = 120
        connect_db = session.execute("select release_version from system.local")
        set_keyspace = session.set_keyspace("keyspace_name")
    
        table_ = "big_mart"
        define_columns = "Item_Identifier varchar PRIMARY KEY, Item_Weight varchar, Item_Fat_Content varchar, Item_Visibility varchar,  Item_Type varchar, Item_MRP varchar, Outlet_Identifier varchar, Outlet_Establishment_Year varchar, Outlet_Size varchar, Outlet_Location_type varchar, Outlet_Type varchar, Item_Outlet_Sales varchar, source varchar"
        drop_table = f"DROP TABLE IF EXISTS {table_}"
        drop_result = session.execute(drop_table)
        create_table = f"CREATE TABLE {table_}({define_columns});"
        table_result = session.execute(create_table)
    
        train = pd.read_csv("train_source")
        test = pd.read_csv("test_source")
    
        #Combine test and train into one file
        train['source']='train'
        test['source']='test'
        df = pd.concat([train, test],ignore_index=True)

        columns = "Item_Identifier, Item_Weight, Item_Fat_Content, Item_Visibility, Item_Type, Item_MRP, Outlet_Identifier, Outlet_Establishment_Year, Outlet_Size, Outlet_Location_Type, Outlet_Type, Item_Outlet_Sales, source"
        insert_qry = f"INSERT INTO {table_}({columns}) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)"
        prepared = session.prepare(insert_qry)

        batch = BatchStatement()
        for i in range(len(df)):
            batch.add(
                prepared,
                (df.iat[i,0], df.iat[i,1], df.iat[i,2], df.iat[i,3], df.iat[i,4], df.iat[i,5], df.iat[i,6], df.iat[i,7], df.iat[i,8], df.iat[i,9], df.iat[i,10], df.iat[i,11], df.iat[i,12])
            )

        session.execute(batch)

    except Exception as e:
        raise Exception("(cassandraDBLoad): Something went wrong in the CassandraDB Load operations\n" + str(e))

I have tried adding the following as the parameter in pd.read_csv

encoding_error = "ignore"

I have also tried to convert the df values(especially the float ones) to string by doing str(df.iat[i,1]).
Both the above steps didn't resolve the error.

csv files link - https://drive.google.com/drive/folders/1O03lNTMfSwhUKG61zOs7fNxXIRe44GRp?usp=sharing

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文