pandas .to_sql()输出超过尺寸限制
我正在使用pydobc和sqlalchemy将数据插入SQL Server中的表格,并且遇到了此错误。
https://i.sstatic.net/misp9.png 使用。
这是我用来连接到SQL Server(使用Fast_executeMany)的功能
def connect(server, database):
global cnxn_str, cnxn, cur, quoted, engine
cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=<server>;"
"Database=<database>;"
"UID=<user>;"
"PWD=<password>;")
cnxn = pyodbc.connect(cnxn_str)
cur = cnxn.cursor()
cur.fast_executemany=True
quoted = quote_plus(cnxn_str)
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany=True)
,这是我用来查询和将数据插入SQL Server的功能,
def insert_to_sql_server():
global df, np_array
# Dataframe df is from a numpy array dtype = object
df = pd.DataFrame(np_array[1:,],columns=np_array[0])
# add new columns, data processing
df['comp_key'] = df['col1']+"-"+df['col2'].astype(str)
df['comp_key2'] = df['col3']+"-"+df['col4'].astype(str)+"-"+df['col5'].astype(str)
df['comp_statusID'] = df['col6']+"-"+df['col7'].astype(str)
convert_dict = {'col1': 'string', 'col2': 'string', ..., 'col_n': 'string'}
# convert data types of columns from objects to strings
df = df.astype(convert_dict)
connect(<server>, <database>)
cur.rollback()
# Delete old records
cur.execute("DELETE FROM <table>")
cur.commit()
# Insert dataframe to table
df.to_sql(<table name>, engine, index=False, \
if_exists='replace', schema='dbo', chunksize=1000, method='multi')
插入功能运行约30分钟,然后最终返回错误消息。
使用较小的DF尺寸时,我没有遇到任何错误。当前的DF大小是27963行和9列。我认为有助于错误的一件事是字符串的长度。默认情况下,numpy数组为dtype ='&lt; u25',但是我必须将其覆盖为dtype ='对象',因为它正在截断文本数据。
我没有想法,因为错误似乎是指熊猫或SQL Server的局限性,而SQL Server我不熟悉。
谢谢
I'm using pydobc and sqlalchemy to insert data into a table in SQL Server, and I'm getting this error.
https://i.sstatic.net/miSp9.png
Here are snippets of the functions I use.
This is the function I use to connect to the SQL server (using fast_executemany)
def connect(server, database):
global cnxn_str, cnxn, cur, quoted, engine
cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=<server>;"
"Database=<database>;"
"UID=<user>;"
"PWD=<password>;")
cnxn = pyodbc.connect(cnxn_str)
cur = cnxn.cursor()
cur.fast_executemany=True
quoted = quote_plus(cnxn_str)
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany=True)
And this is the function I'm using to query and insert the data into the SQL server
def insert_to_sql_server():
global df, np_array
# Dataframe df is from a numpy array dtype = object
df = pd.DataFrame(np_array[1:,],columns=np_array[0])
# add new columns, data processing
df['comp_key'] = df['col1']+"-"+df['col2'].astype(str)
df['comp_key2'] = df['col3']+"-"+df['col4'].astype(str)+"-"+df['col5'].astype(str)
df['comp_statusID'] = df['col6']+"-"+df['col7'].astype(str)
convert_dict = {'col1': 'string', 'col2': 'string', ..., 'col_n': 'string'}
# convert data types of columns from objects to strings
df = df.astype(convert_dict)
connect(<server>, <database>)
cur.rollback()
# Delete old records
cur.execute("DELETE FROM <table>")
cur.commit()
# Insert dataframe to table
df.to_sql(<table name>, engine, index=False, \
if_exists='replace', schema='dbo', chunksize=1000, method='multi')
The insert function runs for about 30 minutes before finally returning the error message.
I encountered no errors when doing it with a smaller df size. The current df size I have is 27963 rows and 9 columns. One thing which I think contributes to the error is the length of the string. By default the numpy array is dtype='<U25', but I had to override this to dtype='object' because it was truncating the text data.
I'm out of ideas because it seems like the error is referring to limitations of either Pandas or the SQL Server, which I'm not familiar with.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
感谢您的所有输入(在这里仍然是新的)!意外偶然发现了解决方案,这是通过从
df.to_sql(chunksize = 1000)
降低到
df.to_sql(chunksize = 200)
在挖掘后转弯后,通过
df.to_sql(chunksize = 1000)
SQL Server有一个限制( httpps://discuss.dizzycoding.com/to_sql-pyodbc -count-field-correct-orsyntax-error/)Thanks for all the input (still new here)! Accidentally stumbled upon the solution, which is by reducing the df.to_sql from
df.to_sql(chunksize=1000)
to
df.to_sql(chunksize=200)
After digging it turns out there's a limitation from SQL server (https://discuss.dizzycoding.com/to_sql-pyodbc-count-field-incorrect-or-syntax-error/)
就我而言,我的“输出超过尺寸限制”错误,并将其修复为df.to_sql中的“ method ='multi'”(方法='multi')。
首先,我尝试了“ Chuncksize”解决方案,但它不起作用。
所以...检查您是否处于相同的情况!
In my case, I had the same "Output exceeds the size limit" error, and I fixed it adding "method='multi'" in df.to_sql(method='multi').
First I tried the "chuncksize" solution and it didn't work.
So... check that if you're at the same scenario!
您可以在将数据插入服务器表中显示进度栏,可以使用
tqdm
库。这是一个例子:You can show a progress bar while inserting data into a Server table, you can utilize the
tqdm
library. Here's an example: