如何保存数据库以便数据框可读？

发布于 2025-01-16 05:59:15 字数 1111 浏览 2 评论 0原文

该程序需要一些 .csv 数据库，对它们执行计算操作，然后需要保存结果数据库，以便使用 Dask.Dataframe 读取它。在 Python 中读回上传的文件时，数据框具有的列类型应保留循环中的内容。我假设您需要使用 csv 文件 + 指定列类型的单独配置文件。

另一个问题，如何读取一个数据帧中的大文件？

主要功能如下所示：

def LoadDataFromDB(con,table):         --In this block, you need to record outgoing

date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")      
chunkS = 100     
filename = "./genfiles/" + date_str + ".gz"       

ExRates = ImportCSV("exchrates/Currency rates.csv")      
log = open("logs/log_"+ date_str + ".txt","w+")       

pbar = tqdm(total=CountTableRows(con)/chunkS)     
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)            

for i,chunk in enumerate(dfSQL):                --In this loop, after the res function, we save the data to a file         
print("Reading a Block of Data...")         
res = Calculate(chunk,ExRates,log)         
df = dd.from_pandas(res, npartitions=3)         
print(chunk.dtypes)                 
pbar.update()                                                        
pbar.close()     
log.close()       
return filename

原文

The program takes some .csv database, performs computational manipulations with them, and after that it is necessary to save the resulting database so that it is readable using Dask.Dataframe When reading back the uploaded file in Python, the column types that the dataframe had in the loop should be preserved. I assume that you need to use csv files + a separate configuration file that specifies the types of columns.

Another question, how can I read a large file in one dataframe?

The main function looks like this:

def LoadDataFromDB(con,table):         --In this block, you need to record outgoing

date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")      
chunkS = 100     
filename = "./genfiles/" + date_str + ".gz"       

ExRates = ImportCSV("exchrates/Currency rates.csv")      
log = open("logs/log_"+ date_str + ".txt","w+")       

pbar = tqdm(total=CountTableRows(con)/chunkS)     
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)            

for i,chunk in enumerate(dfSQL):                --In this loop, after the res function, we save the data to a file         
print("Reading a Block of Data...")         
res = Calculate(chunk,ExRates,log)         
df = dd.from_pandas(res, npartitions=3)         
print(chunk.dtypes)                 
pbar.update()                                                        
pbar.close()     
log.close()       
return filename

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谈下烟灰 2025-01-23 05:59:15

假设您使用的是 NoSQL 数据库，我之前使用以下函数将数据帧保存到数据库中，将其转换为 JSON 格式：

def dataframe_to_dict(df: pd.DataFrame) -> Dict[Any, Any]:
    data_dict = df.to_dict("list")
    d = {"data": data_dict, "dtypes": {col: df[col].dtype.name for col in df.columns}, "index": df.index}
    return d

然后从数据库读取时，我使用此函数将其转换回数据帧

def dict_to_dataframe(d: Dict[Any, Any]):
    df = pd.DataFrame.from_dict(d["data"])
    df.index = d["index"]

    for col, dtype in d["dtypes"].items():
        df[col] = df[col].astype(dtype)
    return df

您可以如果您的数据帧太大以至于它们甚至无法放入单个数据帧，则需要摆弄 dask。但这应该可以帮助您入门。

Assuming you are using a NoSQL database, I have before saved a dataframe to a database using the following function to convert it into a JSON format:

def dataframe_to_dict(df: pd.DataFrame) -> Dict[Any, Any]:
    data_dict = df.to_dict("list")
    d = {"data": data_dict, "dtypes": {col: df[col].dtype.name for col in df.columns}, "index": df.index}
    return d

and then when reading from the database, I use this function to convert it back to a dataframe

def dict_to_dataframe(d: Dict[Any, Any]):
    df = pd.DataFrame.from_dict(d["data"])
    df.index = d["index"]

    for col, dtype in d["dtypes"].items():
        df[col] = df[col].astype(dtype)
    return df

You may need to fiddle around with dask if your dataframes are so large that they cant even fit into a singular dataframe. But this should help you get started.

回复收藏 0 原文

~没有更多了~