如何提高性能和速度

发布于 2024-08-14 06:46:59 字数 1181 浏览 3 评论 0原文

我编写了这个程序用于连接数据并将数据提取到文件中,但是该程序提取数据的速度非常慢。有什么方法可以提高性能以及更快地将数据加载到文件中。我的目标是大约 100,000 到数百万条记录,这就是我担心性能的原因,而且我是否可以像在 java 中那样使用数组获取大小和批处理大小。

import java.sql as sql
import java.lang as lang
def main():
    driver, url, user, passwd = ('oracle.jdbc.driver.OracleDriver','jdbc:oracle:thin:@localhost:1521:xe','odi_temp','odi_temp')
    ##### Register Driver
    lang.Class.forName(driver)
    ##### Create a Connection Object
    myCon = sql.DriverManager.getConnection(url, user, passwd)
    f = open('c:/test_porgram.txt', 'w')
    try:
        ##### Create a Statement
        myStmt = myCon.createStatement()
        ##### Run a Select Query and get a Result Set
        myRs = myStmt.executeQuery("select emp_id ,first_name,last_name,date_of_join from src_sales_12")
        ##### Loop over the Result Set and print the result in a file
        while (myRs.next()):
            print >> f , "%s,%s,%s,%s" %(myRs.getString("EMP_ID"),myRs.getString("FIRST_NAME"),myRs.getString("LAST_NAME"),myRs.getString("DATE_OF_JOIN") )
    finally:
        myCon.close()
        f.close()

### Entry Point of the program
if __name__ == '__main__':
    main()

I have written this program for connecting and fetching the data into file, but this program is so slow in fetching . is there is any way to improve the performance and faster way to load the data into the file . iam targeting around 100,000 to million of records so thats why iam worried about performance and also can i use array fetch size and batch size as we can do in java.

import java.sql as sql
import java.lang as lang
def main():
    driver, url, user, passwd = ('oracle.jdbc.driver.OracleDriver','jdbc:oracle:thin:@localhost:1521:xe','odi_temp','odi_temp')
    ##### Register Driver
    lang.Class.forName(driver)
    ##### Create a Connection Object
    myCon = sql.DriverManager.getConnection(url, user, passwd)
    f = open('c:/test_porgram.txt', 'w')
    try:
        ##### Create a Statement
        myStmt = myCon.createStatement()
        ##### Run a Select Query and get a Result Set
        myRs = myStmt.executeQuery("select emp_id ,first_name,last_name,date_of_join from src_sales_12")
        ##### Loop over the Result Set and print the result in a file
        while (myRs.next()):
            print >> f , "%s,%s,%s,%s" %(myRs.getString("EMP_ID"),myRs.getString("FIRST_NAME"),myRs.getString("LAST_NAME"),myRs.getString("DATE_OF_JOIN") )
    finally:
        myCon.close()
        f.close()

### Entry Point of the program
if __name__ == '__main__':
    main()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

妖妓 2024-08-21 06:46:59

除非您使用最好、最好的数据库和文件服务器设备,或者运行脚本的最差设备,否则此应用程序是 I/O 绑定的。选择从数据库返回后,数据的实际移动将比 Jython、Java 或此代码中的任何低效率问题更占主导地位。

你的CPU在这个过程中基本上是无意识的,你只是没有进行足够的数据转换。您可以编写一个比 I/O 慢的进程,但这不是其中之一。

您可以用 C 语言编写此代码,但我怀疑您不会看到实质性差异。

Unless you're on the finest, finest gear for the DB and file server, or the worst gear running the script, this application is I/O bound. After the select has returned from the DB, the actual movement of the data will dominate more than any inefficiencies in Jython, Java, or this code.

You CPU is basically unconscious during this process, you're simply not doing enough data transformation. You could write a process that is slower than the I/O, but this isn't one of them.

You could write this in C and I doubt you'd see a substantial difference.

墟烟 2024-08-21 06:46:59

难道不能使用 Oracle 命令行 SQL 客户端直接将该查询结果导出到 CSV 文件中吗?

Can't you just use the Oracle command-line SQL client to directly export the results of that query into a CSV file?

妥活 2024-08-21 06:46:59

您可以将 getString 与硬编码索引一起使用,而不是使用列名称(在 print 语句中),这样程序就不必一遍又一遍地查找名称。另外,我对 Jython/Python 文件输出了解不够,无法判断默认情况下是否启用此功能,但您应该尝试确保输出已缓冲。

编辑:

请求的代码(我不声明该代码的正确性):

print >> f , "%s,%s,%s,%s" %(myRs.getString(0),myRs.getString(1),myRs.getString(2),myRs.getString(3) )

myRs = myStmt.executeQuery("select emp_id ,first_name,last_name,date_of_join from src_sales_12")
hasFirst = myRs.next()
if (hasFirst):
    empIdIdx = myRs.findColumn("EMP_ID")
    fNameIdx = myRs.findColumn("FIRST_NAME")
    lNameIdx = myRs.findColumn("LAST_NAME")
    dojIdx = myRs.findColumn("DATE_OF_JOIN")
    print >> f , "%s,%s,%s,%s" %(myRs.getString(empIdIdx),myRs.getString(fNameIdx),myRs.getString(lNameIdx),myRs.getString(dojIdx) )
    ##### Loop over the Result Set and print the result in a file
    while (myRs.next()):
        print >> f , "%s,%s,%s,%s" %(myRs.getString(empIdIdx),myRs.getString(fNameIdx),myRs.getString(lNameIdx),myRs.getString(dojIdx) )

You might use getString with hardcoded indices instead of the column name (in your print statement) so the program doesn't have to look up the names over and over. Also, I don't know enough about Jython/Python file output to say whether this is enabled by default or not, but you should try to make sure your output is buffered.

EDIT:

Code requested (I make no claims about the correctness of this code):

print >> f , "%s,%s,%s,%s" %(myRs.getString(0),myRs.getString(1),myRs.getString(2),myRs.getString(3) )

or

myRs = myStmt.executeQuery("select emp_id ,first_name,last_name,date_of_join from src_sales_12")
hasFirst = myRs.next()
if (hasFirst):
    empIdIdx = myRs.findColumn("EMP_ID")
    fNameIdx = myRs.findColumn("FIRST_NAME")
    lNameIdx = myRs.findColumn("LAST_NAME")
    dojIdx = myRs.findColumn("DATE_OF_JOIN")
    print >> f , "%s,%s,%s,%s" %(myRs.getString(empIdIdx),myRs.getString(fNameIdx),myRs.getString(lNameIdx),myRs.getString(dojIdx) )
    ##### Loop over the Result Set and print the result in a file
    while (myRs.next()):
        print >> f , "%s,%s,%s,%s" %(myRs.getString(empIdIdx),myRs.getString(fNameIdx),myRs.getString(lNameIdx),myRs.getString(dojIdx) )
ζ澈沫 2024-08-21 06:46:59

如果您只想将数据提取到文件中,您可以尝试数据库工具(例如,“加载”,“导出”)。

if you just want to fetch data into files ,you can try database tools(for example , "load","export").

顾铮苏瑾 2024-08-21 06:46:59

您可能还会发现,如果您在 SQL select 语句中构建进入文件的字符串,您将获得更好的性能。

所以你的 SQL 选择应该是 SELECT EMP_ID || ',' ||名字 || ',' ||姓氏 || ',' || DATE_OF_JOIN MY_DATA ...(取决于数据库和分隔符是什么)

然后在您的java代码中您只需获取一个字符串 empData = myRs.findColumn("EMP_DATA") 并将其写入文件。我们已经看到这样做带来了显着的性能优势。

您可能会受益的另一件事是更改 JDBC 连接以使用更大的读取缓冲区 - 而不是在提取中一次提取 30 行,而是提取 5000 行。

You may also find that if you do the construction of the string which goes into the file in the SQL select statement, you will get better performance.

So your SQL select should be SELECT EMP_ID || ',' || FIRST_NAME || ',' || LAST_NAME || ',' || DATE_OF_JOIN MY_DATA ... (depending on what database and separator is)

then in your java code you just get the one string empData = myRs.findColumn("EMP_DATA") and write that to a file. We have seen significant performance benefits doing this.

The other thing you may see benefit from is changing the JDBC connection to use a larger read buffer - rather than 30 rows at a time in the fetch, fetch 5000 rows.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文