如何保存数据库以便数据框可读?
该程序需要一些 .csv 数据库,对它们执行计算操作,然后需要保存结果数据库,以便使用 Dask.Dataframe 读取它。 在 Python 中读回上传的文件时,数据框具有的列类型应保留循环中的内容。我假设您需要使用 csv 文件 + 指定列类型的单独配置文件。
另一个问题,如何读取一个数据帧中的大文件?
主要功能如下所示:
def LoadDataFromDB(con,table): --In this block, you need to record outgoing
date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")
chunkS = 100
filename = "./genfiles/" + date_str + ".gz"
ExRates = ImportCSV("exchrates/Currency rates.csv")
log = open("logs/log_"+ date_str + ".txt","w+")
pbar = tqdm(total=CountTableRows(con)/chunkS)
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)
for i,chunk in enumerate(dfSQL): --In this loop, after the res function, we save the data to a file
print("Reading a Block of Data...")
res = Calculate(chunk,ExRates,log)
df = dd.from_pandas(res, npartitions=3)
print(chunk.dtypes)
pbar.update()
pbar.close()
log.close()
return filename
The program takes some .csv database, performs computational manipulations with them, and after that it is necessary to save the resulting database so that it is readable using Dask.Dataframe When reading back the uploaded file in Python, the column types that the dataframe had in the loop should be preserved. I assume that you need to use csv files + a separate configuration file that specifies the types of columns.
Another question, how can I read a large file in one dataframe?
The main function looks like this:
def LoadDataFromDB(con,table): --In this block, you need to record outgoing
date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")
chunkS = 100
filename = "./genfiles/" + date_str + ".gz"
ExRates = ImportCSV("exchrates/Currency rates.csv")
log = open("logs/log_"+ date_str + ".txt","w+")
pbar = tqdm(total=CountTableRows(con)/chunkS)
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)
for i,chunk in enumerate(dfSQL): --In this loop, after the res function, we save the data to a file
print("Reading a Block of Data...")
res = Calculate(chunk,ExRates,log)
df = dd.from_pandas(res, npartitions=3)
print(chunk.dtypes)
pbar.update()
pbar.close()
log.close()
return filename
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您使用的是 NoSQL 数据库,我之前使用以下函数将数据帧保存到数据库中,将其转换为 JSON 格式:
然后从数据库读取时,我使用此函数将其转换回数据帧
您可以如果您的数据帧太大以至于它们甚至无法放入单个数据帧,则需要摆弄 dask。但这应该可以帮助您入门。
Assuming you are using a NoSQL database, I have before saved a dataframe to a database using the following function to convert it into a JSON format:
and then when reading from the database, I use this function to convert it back to a dataframe
You may need to fiddle around with dask if your dataframes are so large that they cant even fit into a singular dataframe. But this should help you get started.