MySQL加载数据Infile语句在Workbench中起作用,但python不起作用

发布于 2025-01-24 10:38:52 字数 4295 浏览 3 评论 0原文

我正在使用Fannie Mae抵押数据尝试MySQL和数据分析。为此,我创建了两个表(“ perf”和“ ACQ”)和几个Python功能。我首先放下集合和表格,如果它们存在,然后创建集合(Mortgage_analysis)和两个表。然后,我构建了与我要执行的分析年数相对应的文件列表。所有这些都很好。

然后,我使用以下功能将表加载来自Fannie Mae的PERF和ACQ文本文件的数据。相同的功能用于加载两个表。它每次都可以使用“ perf”表工作,并且从未与“ ACQ”桌子一起使用。如果我使用SQL语句并在MySQL Workbench中执行它们,则语句每次都可以使用。我很难过,可以使用一些帮助。

在工作台中工作但在python中工作的SQL语句是:

LOAD DATA  INFILE '/Users/<my user name>/github/mortgage-analysis-example/data/text/acq/Acquisition_2000Q1.txt' 
INTO TABLE acq 
FIELDS TERMINATED BY '|' 
LINES TERMINATED BY '\n' 
(loan_id, orig_channel, seller_name, orig_interest_rate, orig_upb, orig_loan_term, 
 orig_date, first_pay_date, orig_ltv, orig_cltv, num_borrowers, dti, 
 borrower_credit_score, first_home_buyer, loan_purpose, property_type, num_units, 
 occupancy_status, property_state, zip, mortgage_insurance_percent, product_type, 
 coborrow_credit_score, mortgage_insurance_type, relocation_mortgage_ind);

加载的python函数是:

def loadTable(env_in, template, file_list):
    # env_in: (i know, uck global variable, holds info for this notebook common to all functions
    # template: SQL template file
    # file_list: python list element with fully qualified file names to use with SQL statement 
    env = env_in # environment info
    from mysql.connector import Error
    _file = open(env["base_path"]+env["path_separator"]+template, "r")
    _template = _file.readlines()
    try:
        conn = mysql.connector.connect(host=env["mySQL"]["host"],user=env["mySQL"]["user"], passwd=env['pw'])
        if conn.is_connected():
            print('Connected to MySQL database')
    except Error as e:
            print(e)
    cursor = conn.cursor()
    cursor.execute("USE mortgage_analysis;")
    cursor.execute("SET SESSION sql_mode = '';")
    print("starting table load")
    t0 = time.time()
    res = []
    for _file in file_list:
        _sql = _template[0].format(_file)
        print(f"\n{_sql}\n")
        try:
            res = cursor.execute(_sql)
            warn = cursor.fetchwarnings()
            #print(f"warn: {warn}")
        except Error as e:
            print(f"{_sql} \n{e}")

    t1 = time.time()
    print(f"Years: {env['years']} Table load time: {t1-t0}") 
    conn.close
    return env

未识别错误(尝试始终有效),也没有生成警告(FetchWarnings始终为空)。

用于创建两个表的SQL语句是:

DROP TABLE IF EXISTS acq;
CREATE TABLE acq (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, orig_channel VARCHAR(255), seller_name VARCHAR(255), orig_interest_rate DOUBLE, orig_upb DOUBLE, orig_loan_term DOUBLE, orig_date VARCHAR(255), first_pay_date VARCHAR(255), orig_ltv DOUBLE, orig_cltv DOUBLE, num_borrowers DOUBLE, dti DOUBLE, borrower_credit_score DOUBLE, first_home_buyer VARCHAR(255), loan_purpose VARCHAR(255), property_type VARCHAR(255), num_units DOUBLE, occupancy_status VARCHAR(255), property_state VARCHAR(255), zip DOUBLE, mortgage_insurance_percent DOUBLE, product_type VARCHAR(255), coborrow_credit_score DOUBLE, mortgage_insurance_type DOUBLE, relocation_mortgage_ind VARCHAR(255), PRIMARY KEY (id));
DROP TABLE IF EXISTS perf;
CREATE TABLE perf (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, monthly_reporting_period VARCHAR(255), servicer VARCHAR(255), interest_rate DECIMAL(6,3), current_actual_upb DECIMAL(12,2), loan_age DOUBLE, remaining_months_to_legal_maturity DOUBLE, adj_remaining_months_to_maturity DOUBLE, maturity_date VARCHAR(255), msa DOUBLE, current_loan_delinquency_status DOUBLE, mod_flag VARCHAR(255), zero_balance_code VARCHAR(255), zero_balance_effective_date VARCHAR(255), last_paid_installment_date VARCHAR(255), foreclosed_after VARCHAR(255), disposition_date VARCHAR(255), foreclosure_costs DOUBLE, prop_preservation_and_reair_costs DOUBLE, asset_recovery_costs DOUBLE, misc_holding_expenses DOUBLE, holding_taxes DOUBLE, net_sale_proceeds DOUBLE, credit_enhancement_proceeds DOUBLE, repurchase_make_whole_proceeds DOUBLE, other_foreclosure_proceeds DOUBLE, non_interest_bearing_upb DOUBLE, principal_forgiveness_upb VARCHAR(255), repurchase_make_whole_proceeds_flag VARCHAR(255), foreclosure_principal_write_off_amount VARCHAR(255), servicing_activity_indicator VARCHAR(255), PRIMARY KEY (id));

I am experimenting with MySQL and data analytics using the Fannie Mae mortgage data. To do this, I've created two tables, ("perf" and "acq") and several python functions. I start by dropping the collection and tables if they exist, then create the collection (mortgage_analysis) and the two tables. Then I build a list of files that correspond to the number of years of analysis I want to perform. All of that works just fine.

I then use the following function to load the tables with data from the perf and acq text files from Fannie Mae. The same function is used to load both tables. It works every time with the "perf" table and NEVER works with the "acq" table. If I take the SQL statements and execute them in the mySQL workbench, the statements work every time. I'm stumped and could use some help.

The SQL statement that works in the workbench but not in Python is:

LOAD DATA  INFILE '/Users/<my user name>/github/mortgage-analysis-example/data/text/acq/Acquisition_2000Q1.txt' 
INTO TABLE acq 
FIELDS TERMINATED BY '|' 
LINES TERMINATED BY '\n' 
(loan_id, orig_channel, seller_name, orig_interest_rate, orig_upb, orig_loan_term, 
 orig_date, first_pay_date, orig_ltv, orig_cltv, num_borrowers, dti, 
 borrower_credit_score, first_home_buyer, loan_purpose, property_type, num_units, 
 occupancy_status, property_state, zip, mortgage_insurance_percent, product_type, 
 coborrow_credit_score, mortgage_insurance_type, relocation_mortgage_ind);

The python function to load this is:

def loadTable(env_in, template, file_list):
    # env_in: (i know, uck global variable, holds info for this notebook common to all functions
    # template: SQL template file
    # file_list: python list element with fully qualified file names to use with SQL statement 
    env = env_in # environment info
    from mysql.connector import Error
    _file = open(env["base_path"]+env["path_separator"]+template, "r")
    _template = _file.readlines()
    try:
        conn = mysql.connector.connect(host=env["mySQL"]["host"],user=env["mySQL"]["user"], passwd=env['pw'])
        if conn.is_connected():
            print('Connected to MySQL database')
    except Error as e:
            print(e)
    cursor = conn.cursor()
    cursor.execute("USE mortgage_analysis;")
    cursor.execute("SET SESSION sql_mode = '';")
    print("starting table load")
    t0 = time.time()
    res = []
    for _file in file_list:
        _sql = _template[0].format(_file)
        print(f"\n{_sql}\n")
        try:
            res = cursor.execute(_sql)
            warn = cursor.fetchwarnings()
            #print(f"warn: {warn}")
        except Error as e:
            print(f"{_sql} \n{e}")

    t1 = time.time()
    print(f"Years: {env['years']} Table load time: {t1-t0}") 
    conn.close
    return env

No errors are identified (try always works) and no warnings are generated (fetchwarnings is always empty).

The SQL statements used to create the two tables are:

DROP TABLE IF EXISTS acq;
CREATE TABLE acq (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, orig_channel VARCHAR(255), seller_name VARCHAR(255), orig_interest_rate DOUBLE, orig_upb DOUBLE, orig_loan_term DOUBLE, orig_date VARCHAR(255), first_pay_date VARCHAR(255), orig_ltv DOUBLE, orig_cltv DOUBLE, num_borrowers DOUBLE, dti DOUBLE, borrower_credit_score DOUBLE, first_home_buyer VARCHAR(255), loan_purpose VARCHAR(255), property_type VARCHAR(255), num_units DOUBLE, occupancy_status VARCHAR(255), property_state VARCHAR(255), zip DOUBLE, mortgage_insurance_percent DOUBLE, product_type VARCHAR(255), coborrow_credit_score DOUBLE, mortgage_insurance_type DOUBLE, relocation_mortgage_ind VARCHAR(255), PRIMARY KEY (id));
DROP TABLE IF EXISTS perf;
CREATE TABLE perf (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, monthly_reporting_period VARCHAR(255), servicer VARCHAR(255), interest_rate DECIMAL(6,3), current_actual_upb DECIMAL(12,2), loan_age DOUBLE, remaining_months_to_legal_maturity DOUBLE, adj_remaining_months_to_maturity DOUBLE, maturity_date VARCHAR(255), msa DOUBLE, current_loan_delinquency_status DOUBLE, mod_flag VARCHAR(255), zero_balance_code VARCHAR(255), zero_balance_effective_date VARCHAR(255), last_paid_installment_date VARCHAR(255), foreclosed_after VARCHAR(255), disposition_date VARCHAR(255), foreclosure_costs DOUBLE, prop_preservation_and_reair_costs DOUBLE, asset_recovery_costs DOUBLE, misc_holding_expenses DOUBLE, holding_taxes DOUBLE, net_sale_proceeds DOUBLE, credit_enhancement_proceeds DOUBLE, repurchase_make_whole_proceeds DOUBLE, other_foreclosure_proceeds DOUBLE, non_interest_bearing_upb DOUBLE, principal_forgiveness_upb VARCHAR(255), repurchase_make_whole_proceeds_flag VARCHAR(255), foreclosure_principal_write_off_amount VARCHAR(255), servicing_activity_indicator VARCHAR(255), PRIMARY KEY (id));

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小霸王臭丫头 2025-01-31 10:38:52

我测试了代码,我必须进行一些更改才能使其正常工作。

不要更改sql_mode。我没有遇到任何错误,我能够加载数据而不会损害SQL_Mode。

我使用了测试数据:

1|2|name1|3|4|4|date|date|5|6|7|8|9|buyer|purpose|type|10|status|state|11|12|type|13|14|ind
1|2|name2|3|4|4|date|date|5|6|7|8|9|buyer|purpose|type|10|status|state|11|12|type|13|14|ind

我敦促您选择更多适当的数据类型。除非您存储科学测量或其他内容,否则您应该在MySQL中使用浮动或两倍。绝对不是货币单位或整数。

我不会使用varchar存储日期。 MySQL有日期和日期,并且确保日期构成良好的日期,因此您可以进行比较,排序,日期算术等。

如果您有错误建议您放宽SQL_Mode并允许无效查询或无效的数据,我会建议您改用数据。即使您加载了数据,如果您允许非图片SQL模式,风险也将成为垃圾。

代码更改:

我使用查询参数,而不是使用格式()尝试将文件名插入查询模板。使用_template [0] .format(_File)删除行,而是使用:

res = cursor.execute(_template, [_file])

但是模板必须将占位符放入无引号的情况下:

正确:

LOAD DATA INFILE %s INTO TABLE...

不正确:

LOAD DATA INFILE '%s' INTO TABLE...

最终,Python中的数据更改未投入默认情况下。也就是说,您可以插入数据,然后当您使用conn.close时,丢弃了不承诺的更改。因此,我添加了一个行:

conn.commit()

执行SQL后,我将其放在try/Catch块中。

这成功地加载了数据。请注意,由于您没有共享样本,因此我必须对您的输入数据做出假设。我不知道您的文件实际上是否与适当的场分隔符和线路分离器相关。但是我认为是,因为您说它在MySQL Workbench中工作。

I tested the code and I had to make a few changes to get it to work.

Don't change sql_mode. I did not get any errors, I was able to load the data without compromising the sql_mode.

I used test data:

1|2|name1|3|4|4|date|date|5|6|7|8|9|buyer|purpose|type|10|status|state|11|12|type|13|14|ind
1|2|name2|3|4|4|date|date|5|6|7|8|9|buyer|purpose|type|10|status|state|11|12|type|13|14|ind

I urge you to choose more appropriate data types. You should use FLOAT or DOUBLE almost never in MySQL, unless you're storing scientific measurements or something. Definitely not for units of currency, or integers.

I would not use VARCHAR to store dates. MySQL has DATE and DATETIME, and these ensure dates are well-formed so you can do comparisons, sorting, date arithmetic, etc.

If you have errors that suggest you should relax the sql_mode and permit invalid queries or invalid data, I would recommend you fix the data instead. Even if you get data loaded, there's a risk it will become garbage if you allow non-strict SQL mode.

Code changes:

Instead of using format() to try to insert the filename into the query template, I used a query parameter. Delete the line with _template[0].format(_file), and instead use:

res = cursor.execute(_template, [_file])

But the template has to put the placeholder in without quotes:

Correct:

LOAD DATA INFILE %s INTO TABLE...

Incorrect:

LOAD DATA INFILE '%s' INTO TABLE...

Finally, data changes in Python are not committed by default. That is, you can insert data, then when you use conn.close the uncommitted changes are discarded. So I added a line:

conn.commit()

I put this in the try/catch block after executing the SQL.

This was successful in loading the data. Note I had to make assumptions about your input data, since you did not share a sample. I don't know if your file is actually well-formed with the proper field separators and line separators. But I assumed it was, since you said it worked in MySQL Workbench.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文