使用 SQLite3 优化 Python 代码诱变剂

发布于 2024-12-23 02:57:23 字数 13764 浏览 3 评论 0原文

我正在改进一个开源音乐数​​据库,该数据库从我的收藏中读取歌曲并将它们存储到 SQLite 数据库中。反过来,我可以利用数据库查找重复项,对我的收藏运行查询,并且(如果我愿意)在收藏中查找重复的歌曲。为了从音乐文件中读取元数据,我利用 Mutagen 库,并存储元数据,我使用的是SQLite3。

我想在一个相当大的集合上测试我编写的代码,所以我联系了同学和家人,发现总测试规模约为 90,000。这还包含混合信息 - 歌曲为 .mp3、.ogg 或 .flac 格式。

我的主要问题是速度 - 我的代码可以工作,但速度慢得不合适。在当前状态下,它大约在 35:45 内完成测试大小。我的主要问题:我能做些什么来提高这段代码的性能?我认为它与 Mutagen 直接或 SQLite3 中的某些内容直接相关,尽管我愿意接受关于什么的建议将是从这里提高效率的理想方法。


我已经经历了两次迭代来改进代码的这个关键部分。第一个改进导致运行时间缩短为21:30,但这仍然很糟糕。我决定重构代码以减少函数调用的次数,并尝试提高性能。然而,结果是性能下降,但函数调用数量大幅减少 - 第二批运行时间接近51:51,即简直令人无法接受。

接下来的代码适用于“改进的”运行时和重构集。还附加了每段代码的单独配置文件。

第 1 部分:“改进的”运行时

def walk(self, d):
    '''Walk down the file structure iteratively, gathering file names to be read in.'''
    d = os.path.abspath(d)
    dirpath = os.walk(d)
    for folder in dirpath:
        for f in folder[2]: # for each file in the folder...
            supported = 'mp3', 'ogg', 'flac'
            if f.split('.')[-1] in supported:
                try:
                    self.parse(os.path.join(folder[0], f))
                    if self.filecount == 2000 or self.leftover:
                        self.filecount = 0
                        try:
                            self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
                        except Exception, e:
                            print e.__unicode__()
                        finally:
                            del self.buf
                            self.buf = [] # wipe the buffers clean so we can repeat a batch parse again.
                except Exception, e:
                    print e.__unicode__()
    try:
        self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
    except Exception, e:
        print e.__unicode__()
    finally:
        del self.buf
        self.buf = [] # wipe the buffers clean so we can repeat a batch parse again.

def parse(self, filename):
    '''Process and parse the music file to extract desired information.

    It may be the case that, in the future, we require more information from a song than is provided at this time.
    Examine all tags that can be retrieved from a mutagen.File object, and adjust the database's schema accordingly.'''

    if ".ogg" in filename:
        song = OggVorbis(filename)
    elif ".mp3" in filename:
        song = MP3(filename)
    elif ".flac" in filename:
        song = FLAC(filename)
    else:
        raise InvalidSongException(u"Song is not supported by K'atun at this time.")

    filename = u'filename'

    #song = mutagen.File(filename, easy=True)
    artist, title, genre, track, album, bitrate, year, month = '', '', '', '', '', '', '', ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except Exception:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    if 'genre' in song:
        genre = song['genre'][0]
    else:
        genre = u'Unknown'
    if 'tracknumber' in song:
        track = song['tracknumber'][0]
    else:
        track = 0
    if 'album' in song:
        album = song['album'][0]
    else:
        album = u'Unknown'
    if 'date' in song:
        year = song['date'][0]
    else:
        year = 'Unknown'
    try:
        bitrate = int(song.info.bitrate)
    except AttributeError: # Likely due to us messing with FLAC
        bitrate = 999999 # Set to a special flag value, to indicate that this is a lossless file.
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1


Sat Dec 24 21:24:23 2011    modified.dat

     70626027 function calls (70576436 primitive calls) in 1290.127 CPU seconds

Ordered by: cumulative time
List reduced from 666 to 28 due to restriction <28>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.033    0.033 1290.127 1290.127 parser.py:6(<module>)
    1    0.000    0.000 1290.090 1290.090 parser.py:90(main)
    1    0.000    0.000 1286.493 1286.493 parser.py:24(__init__)
    1    1.826    1.826 1286.335 1286.335 parser.py:35(walk)
90744    2.376    0.000 1264.788    0.014 parser.py:55(parse)
90744   11.840    0.000 1250.401    0.014 lib/mutagen/__init__.py:158(File)
376019  613.881    0.002  613.881    0.002 {method 'seek' of 'file' objects}
90744    1.231    0.000  580.143    0.006 /usr/lib/pymodules/python2.7/mutagen/apev2.py:458(score)
671848  530.346    0.001  530.346    0.001 {method 'read' of 'file' objects}
90742    0.846    0.000  242.337    0.003 /usr/lib/pymodules/python2.7/mutagen/__init__.py:68(__init__)
63944    2.471    0.000  177.050    0.003 /usr/lib/pymodules/python2.7/mutagen/id3.py:1973(load)
63944    0.526    0.000  119.326    0.002 /usr/lib/pymodules/python2.7/mutagen/easyid3.py:161(__init__)
63944    4.649    0.000  118.077    0.002 /usr/lib/pymodules/python2.7/mutagen/id3.py:89(load)
26782    1.073    0.000   64.435    0.002 /usr/lib/pymodules/python2.7/mutagen/ogg.py:434(load)
127531    0.464    0.000   59.314    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:76(__fullread)
63944    1.078    0.000   54.060    0.001 /usr/lib/pymodules/python2.7/mutagen/mp3.py:68(__init__)
26782    0.638    0.000   53.613    0.002 /usr/lib/pymodules/python2.7/mutagen/ogg.py:379(find_last)
66487    3.167    0.000   50.136    0.001 /usr/lib/pymodules/python2.7/mutagen/mp3.py:106(__try)
855079    6.415    0.000   33.237    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:279(__read_frames)
816987    0.904    0.000   24.491    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:321(__load_framedata)
816987    2.805    0.000   23.587    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:1023(fromData)
60803/11257    0.370    0.000   19.036    0.002 /usr/lib/python2.7/os.py:209(walk)
11256   14.651    0.001   14.651    0.001 {posix.listdir}
816973    3.265    0.000   13.140    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:996(_readData)
879103    4.936    0.000   11.473    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:964(__init__)
872462    0.967    0.000   11.336    0.000 /usr/lib/pymodules/python2.7/mutagen/__init__.py:78(__getitem__)
63944    1.969    0.000   10.871    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:443(update_to_v24)
619380    1.396    0.000    8.521    0.000 /usr/lib/pymodules/python2.7/mutagen/easyid3.py:175(__getitem__)

第 2 部分:重构代码

def walk(self, d):
    '''Walk down the file structure iteratively, gathering file names to be read in.'''
    d = os.path.abspath(d)
    dirpath = os.walk(d)
    parsecount = 0
    start = time.time()
    for folder in dirpath:
        for f in folder[2]: # for each file in the folder...
            filetype = f.split('.')[-1].lower()
            if filetype == 'mp3':
                try:
                    self.read_mp3(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            elif filetype == 'ogg':
                try:
                    self.read_vorbis(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            elif filetype == 'flac':
                try:
                    self.read_flac(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            else:
                continue
            if self.filecount == 2000 or self.leftover:
                self.filecount = 0
                print "Time differential: %1.4f s" % (time.time() - start)
                self.batch_commit()
    try:
        print "Wrapping up"
        self.batch_commit()
    except Exception, e:
        print e.__unicode__()
    finally:
        print "Elapsed time: " + str(time.time()-start)

def batch_commit(self):
    '''Insert new values into the database in large quantities.'''
    self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
    self.buf = []

def read_mp3(self, filename):
    '''Read and extract an MP3 file's tags.  This makes use of the ID3 standard, not the easy ID3 tag system.'''
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''

    song = MP3(filename)
    keys = song.keys()
    try:
        artist = song['TPE1'].__unicode__()
        title = song['TIT2'].__unicode__()
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['TCON'].__unicode__() if "TCON" in keys else u'Unknown'
    track = song['TRCK'].__unicode__() if "TRCK" in keys else u'0'
    album = song['TALB'].__unicode__() if "TALB" in keys else u'Unknown'
    bitrate = int(song.info.bitrate)
    year = song['TDRC'].__unicode__() if "TDRC" in keys else u'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1

def read_vorbis(self, filename):
    '''Read and extract an Ogg Vorbis file's tags.'''
    song = OggVorbis(filename)
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['genre'][0] if genre else u'Unknown'
    track = song['tracknumber'][0] if 'tracknumber' in song else u'0'
    album = song['album'][0] if 'album' in song else u'Unknown'
    bitrate = int(song.info.bitrate)
    year = song['date'][0] if 'date' in song else 'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1


def read_flac(self, filename):
    '''Read and extract a FLAC file's tags.'''
    song = FLAC(filename)
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['genre'][0] if genre else u'Unknown'
    track = song['tracknumber'][0] if 'tracknumber' in song else u'0'
    album = song['album'][0] if 'album' in song else u'Unknown'
    bitrate = 999999 # Special flag for K'atun; will know that this is a lossless file
    year = song['date'][0] if 'date' in song else 'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1


Mon Dec 26 03:22:34 2011    refactored.dat

59939763 function calls (59890172 primitive calls) in 3111.490 CPU seconds

Ordered by: cumulative time
List reduced from 559 to 28 due to restriction <28>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.001    0.001 3111.490 3111.490 parser.py:6(<module>)
1    0.000    0.000 3111.477 3111.477 parser.py:138(main)
1    0.000    0.000 3108.242 3108.242 parser.py:27(__init__)
1    1.760    1.760 3108.062 3108.062 parser.py:40(walk)
46    0.103    0.002 2220.618   48.274 parser.py:78(batch_commit)
46    0.002    0.000 2220.515   48.272 db_backend.py:127(execute_batch_insert_statement)
46 2184.900   47.498 2184.900   47.498 {method 'executemany' of 'sqlite3.Connection' objects}
90747    0.515    0.000  845.343    0.009 /usr/lib/pymodules/python2.7/mutagen/__init__.py:68(__init__)
426651  640.459    0.002  640.459    0.002 {method 'read' of 'file' objects}
63945    1.847    0.000  582.267    0.009 parser.py:83(read_mp3)
63945    2.372    0.000  577.245    0.009 /usr/lib/pymodules/python2.7/mutagen/id3.py:1973(load)
63945    0.307    0.000  514.927    0.008 /usr/lib/pymodules/python2.7/mutagen/id3.py:72(__init__)
63945    0.256    0.000  514.620    0.008 /usr/lib/pymodules/python2.7/mutagen/_util.py:103(__init__)
63945    0.225    0.000  514.363    0.008 /usr/lib/pymodules/python2.7/mutagen/__init__.py:35(__init__)
63945    4.188    0.000  514.139    0.008 /usr/lib/pymodules/python2.7/mutagen/id3.py:89(load)
127533    0.802    0.000  455.713    0.004 /usr/lib/pymodules/python2.7/mutagen/id3.py:76(__fullread)
63945    1.029    0.000  432.574    0.007 /usr/lib/pymodules/python2.7/mutagen/id3.py:202(__load_header)
26786    0.504    0.000  270.216    0.010 parser.py:102(read_vorbis)
26786    1.095    0.000  267.578    0.010 /usr/lib/pymodules/python2.7/mutagen/ogg.py:434(load)
26782    0.627    0.000  143.492    0.005 /usr/lib/pymodules/python2.7/mutagen/ogg.py:379(find_last)
221337  121.448    0.001  121.448    0.001 {method 'seek' of 'file' objects}
97603    1.797    0.000  118.799    0.001 /usr/lib/pymodules/python2.7/mutagen/ogg.py:66(__init__)
26786    0.342    0.000  114.656    0.004 lib/mutagen/oggvorbis.py:40(__init__)
63945    0.646    0.000   58.809    0.001 lib/mutagen/mp3.py:68(__init__)
66480    3.377    0.000   57.489    0.001 lib/mutagen/mp3.py:106(__try)
47   35.609    0.758   35.609    0.758 {method 'commit' of 'sqlite3.Connection' objects}
855108    6.184    0.000   32.181    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:279(__read_frames)
60803/11257    0.385    0.000   31.885    0.003 /usr/lib/python2.7/os.py:209(walk)

I'm in the process of improving an open-source music database, which reads songs in from my collection and stores them to an SQLite database. In turn, I'm able to leverage the database to find duplicates, run queries on my collection, and (if I so desired) find duplicate songs in the collection. To read the metadata from the music files, I'm leveraging the Mutagen library, and to store the metadata, I'm using SQLite3.

I wanted to test the code I had authored on a sizable collection, so I contacted fellow students and family, and came across a total test size of about 90,000. This also consists of mixed information - songs are of either .mp3, .ogg, or .flac format.

My main issue is with speed - the code that I have works, but it is unsuitably slow. In its current state, it runs through the test size in about 35:45. My main question: what can I do to improve the performance of this block of code? I think that it is related directly to something within either Mutagen directly or SQLite3, although I'm open to suggestions on what would be an ideal approach to improve efficiency from here.


I had gone through two iterations of improving this critical part of code. The first improvement led to a reduced runtime of 21:30, but that's still horrible. I decided to refactor the code to reduce the number of function calls made, and an attempt to improve performance. The result, however, is a regression of performance, but a huge reduction in the number of function calls made - the second batch runs for close to 51:51, which is simply unacceptable.

What follows in terms of code is for both the "improved" runtime and refactored set. Also attached are separate profiles for each piece of code.

Block 1: "Improved" Runtime

def walk(self, d):
    '''Walk down the file structure iteratively, gathering file names to be read in.'''
    d = os.path.abspath(d)
    dirpath = os.walk(d)
    for folder in dirpath:
        for f in folder[2]: # for each file in the folder...
            supported = 'mp3', 'ogg', 'flac'
            if f.split('.')[-1] in supported:
                try:
                    self.parse(os.path.join(folder[0], f))
                    if self.filecount == 2000 or self.leftover:
                        self.filecount = 0
                        try:
                            self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
                        except Exception, e:
                            print e.__unicode__()
                        finally:
                            del self.buf
                            self.buf = [] # wipe the buffers clean so we can repeat a batch parse again.
                except Exception, e:
                    print e.__unicode__()
    try:
        self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
    except Exception, e:
        print e.__unicode__()
    finally:
        del self.buf
        self.buf = [] # wipe the buffers clean so we can repeat a batch parse again.

def parse(self, filename):
    '''Process and parse the music file to extract desired information.

    It may be the case that, in the future, we require more information from a song than is provided at this time.
    Examine all tags that can be retrieved from a mutagen.File object, and adjust the database's schema accordingly.'''

    if ".ogg" in filename:
        song = OggVorbis(filename)
    elif ".mp3" in filename:
        song = MP3(filename)
    elif ".flac" in filename:
        song = FLAC(filename)
    else:
        raise InvalidSongException(u"Song is not supported by K'atun at this time.")

    filename = u'filename'

    #song = mutagen.File(filename, easy=True)
    artist, title, genre, track, album, bitrate, year, month = '', '', '', '', '', '', '', ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except Exception:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    if 'genre' in song:
        genre = song['genre'][0]
    else:
        genre = u'Unknown'
    if 'tracknumber' in song:
        track = song['tracknumber'][0]
    else:
        track = 0
    if 'album' in song:
        album = song['album'][0]
    else:
        album = u'Unknown'
    if 'date' in song:
        year = song['date'][0]
    else:
        year = 'Unknown'
    try:
        bitrate = int(song.info.bitrate)
    except AttributeError: # Likely due to us messing with FLAC
        bitrate = 999999 # Set to a special flag value, to indicate that this is a lossless file.
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1

Sat Dec 24 21:24:23 2011    modified.dat

     70626027 function calls (70576436 primitive calls) in 1290.127 CPU seconds

Ordered by: cumulative time
List reduced from 666 to 28 due to restriction <28>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.033    0.033 1290.127 1290.127 parser.py:6(<module>)
    1    0.000    0.000 1290.090 1290.090 parser.py:90(main)
    1    0.000    0.000 1286.493 1286.493 parser.py:24(__init__)
    1    1.826    1.826 1286.335 1286.335 parser.py:35(walk)
90744    2.376    0.000 1264.788    0.014 parser.py:55(parse)
90744   11.840    0.000 1250.401    0.014 lib/mutagen/__init__.py:158(File)
376019  613.881    0.002  613.881    0.002 {method 'seek' of 'file' objects}
90744    1.231    0.000  580.143    0.006 /usr/lib/pymodules/python2.7/mutagen/apev2.py:458(score)
671848  530.346    0.001  530.346    0.001 {method 'read' of 'file' objects}
90742    0.846    0.000  242.337    0.003 /usr/lib/pymodules/python2.7/mutagen/__init__.py:68(__init__)
63944    2.471    0.000  177.050    0.003 /usr/lib/pymodules/python2.7/mutagen/id3.py:1973(load)
63944    0.526    0.000  119.326    0.002 /usr/lib/pymodules/python2.7/mutagen/easyid3.py:161(__init__)
63944    4.649    0.000  118.077    0.002 /usr/lib/pymodules/python2.7/mutagen/id3.py:89(load)
26782    1.073    0.000   64.435    0.002 /usr/lib/pymodules/python2.7/mutagen/ogg.py:434(load)
127531    0.464    0.000   59.314    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:76(__fullread)
63944    1.078    0.000   54.060    0.001 /usr/lib/pymodules/python2.7/mutagen/mp3.py:68(__init__)
26782    0.638    0.000   53.613    0.002 /usr/lib/pymodules/python2.7/mutagen/ogg.py:379(find_last)
66487    3.167    0.000   50.136    0.001 /usr/lib/pymodules/python2.7/mutagen/mp3.py:106(__try)
855079    6.415    0.000   33.237    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:279(__read_frames)
816987    0.904    0.000   24.491    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:321(__load_framedata)
816987    2.805    0.000   23.587    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:1023(fromData)
60803/11257    0.370    0.000   19.036    0.002 /usr/lib/python2.7/os.py:209(walk)
11256   14.651    0.001   14.651    0.001 {posix.listdir}
816973    3.265    0.000   13.140    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:996(_readData)
879103    4.936    0.000   11.473    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:964(__init__)
872462    0.967    0.000   11.336    0.000 /usr/lib/pymodules/python2.7/mutagen/__init__.py:78(__getitem__)
63944    1.969    0.000   10.871    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:443(update_to_v24)
619380    1.396    0.000    8.521    0.000 /usr/lib/pymodules/python2.7/mutagen/easyid3.py:175(__getitem__)

Block 2: Refactored Code

def walk(self, d):
    '''Walk down the file structure iteratively, gathering file names to be read in.'''
    d = os.path.abspath(d)
    dirpath = os.walk(d)
    parsecount = 0
    start = time.time()
    for folder in dirpath:
        for f in folder[2]: # for each file in the folder...
            filetype = f.split('.')[-1].lower()
            if filetype == 'mp3':
                try:
                    self.read_mp3(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            elif filetype == 'ogg':
                try:
                    self.read_vorbis(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            elif filetype == 'flac':
                try:
                    self.read_flac(os.path.join(folder[0], f).decode('utf_8'))
                except Exception, e:
                    print e.__unicode__()
            else:
                continue
            if self.filecount == 2000 or self.leftover:
                self.filecount = 0
                print "Time differential: %1.4f s" % (time.time() - start)
                self.batch_commit()
    try:
        print "Wrapping up"
        self.batch_commit()
    except Exception, e:
        print e.__unicode__()
    finally:
        print "Elapsed time: " + str(time.time()-start)

def batch_commit(self):
    '''Insert new values into the database in large quantities.'''
    self.db.execute_batch_insert_statement(u"INSERT INTO song VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", self.buf)
    self.buf = []

def read_mp3(self, filename):
    '''Read and extract an MP3 file's tags.  This makes use of the ID3 standard, not the easy ID3 tag system.'''
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''

    song = MP3(filename)
    keys = song.keys()
    try:
        artist = song['TPE1'].__unicode__()
        title = song['TIT2'].__unicode__()
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['TCON'].__unicode__() if "TCON" in keys else u'Unknown'
    track = song['TRCK'].__unicode__() if "TRCK" in keys else u'0'
    album = song['TALB'].__unicode__() if "TALB" in keys else u'Unknown'
    bitrate = int(song.info.bitrate)
    year = song['TDRC'].__unicode__() if "TDRC" in keys else u'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1

def read_vorbis(self, filename):
    '''Read and extract an Ogg Vorbis file's tags.'''
    song = OggVorbis(filename)
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['genre'][0] if genre else u'Unknown'
    track = song['tracknumber'][0] if 'tracknumber' in song else u'0'
    album = song['album'][0] if 'album' in song else u'Unknown'
    bitrate = int(song.info.bitrate)
    year = song['date'][0] if 'date' in song else 'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1


def read_flac(self, filename):
    '''Read and extract a FLAC file's tags.'''
    song = FLAC(filename)
    artist, title, genre, track, album, bitrate, year = '', '', '', '', '', 0, ''
    try:
        artist = song['artist'][0]
        title = song['title'][0]
    except KeyError, e:
        raise InvalidSongException(u"Cannot read " + filename + ": missing critical song information.")
    genre = song['genre'][0] if genre else u'Unknown'
    track = song['tracknumber'][0] if 'tracknumber' in song else u'0'
    album = song['album'][0] if 'album' in song else u'Unknown'
    bitrate = 999999 # Special flag for K'atun; will know that this is a lossless file
    year = song['date'][0] if 'date' in song else 'Unknown'
    self.buf.append((filename, artist, filename.split('.')[-1], title, genre, track, album, bitrate, year, time.time()))
    self.filecount += 1

Mon Dec 26 03:22:34 2011    refactored.dat

59939763 function calls (59890172 primitive calls) in 3111.490 CPU seconds

Ordered by: cumulative time
List reduced from 559 to 28 due to restriction <28>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.001    0.001 3111.490 3111.490 parser.py:6(<module>)
1    0.000    0.000 3111.477 3111.477 parser.py:138(main)
1    0.000    0.000 3108.242 3108.242 parser.py:27(__init__)
1    1.760    1.760 3108.062 3108.062 parser.py:40(walk)
46    0.103    0.002 2220.618   48.274 parser.py:78(batch_commit)
46    0.002    0.000 2220.515   48.272 db_backend.py:127(execute_batch_insert_statement)
46 2184.900   47.498 2184.900   47.498 {method 'executemany' of 'sqlite3.Connection' objects}
90747    0.515    0.000  845.343    0.009 /usr/lib/pymodules/python2.7/mutagen/__init__.py:68(__init__)
426651  640.459    0.002  640.459    0.002 {method 'read' of 'file' objects}
63945    1.847    0.000  582.267    0.009 parser.py:83(read_mp3)
63945    2.372    0.000  577.245    0.009 /usr/lib/pymodules/python2.7/mutagen/id3.py:1973(load)
63945    0.307    0.000  514.927    0.008 /usr/lib/pymodules/python2.7/mutagen/id3.py:72(__init__)
63945    0.256    0.000  514.620    0.008 /usr/lib/pymodules/python2.7/mutagen/_util.py:103(__init__)
63945    0.225    0.000  514.363    0.008 /usr/lib/pymodules/python2.7/mutagen/__init__.py:35(__init__)
63945    4.188    0.000  514.139    0.008 /usr/lib/pymodules/python2.7/mutagen/id3.py:89(load)
127533    0.802    0.000  455.713    0.004 /usr/lib/pymodules/python2.7/mutagen/id3.py:76(__fullread)
63945    1.029    0.000  432.574    0.007 /usr/lib/pymodules/python2.7/mutagen/id3.py:202(__load_header)
26786    0.504    0.000  270.216    0.010 parser.py:102(read_vorbis)
26786    1.095    0.000  267.578    0.010 /usr/lib/pymodules/python2.7/mutagen/ogg.py:434(load)
26782    0.627    0.000  143.492    0.005 /usr/lib/pymodules/python2.7/mutagen/ogg.py:379(find_last)
221337  121.448    0.001  121.448    0.001 {method 'seek' of 'file' objects}
97603    1.797    0.000  118.799    0.001 /usr/lib/pymodules/python2.7/mutagen/ogg.py:66(__init__)
26786    0.342    0.000  114.656    0.004 lib/mutagen/oggvorbis.py:40(__init__)
63945    0.646    0.000   58.809    0.001 lib/mutagen/mp3.py:68(__init__)
66480    3.377    0.000   57.489    0.001 lib/mutagen/mp3.py:106(__try)
47   35.609    0.758   35.609    0.758 {method 'commit' of 'sqlite3.Connection' objects}
855108    6.184    0.000   32.181    0.000 /usr/lib/pymodules/python2.7/mutagen/id3.py:279(__read_frames)
60803/11257    0.385    0.000   31.885    0.003 /usr/lib/python2.7/os.py:209(walk)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

捂风挽笑 2024-12-30 02:57:23

如果你愿意,你可以深入研究线程:) - 这在 python 中非常容易。

创建两个队列对象,一个用于 IN,一个用于 OUT,并在多个线程之间扫描文件。 (工作线程和队列是线程安全的)

应该只需要大约 80loc 的代码,并且您可能可以保留当前的函数,只需将它们包装在适当的类中即可。 (为了让“Thread”接受它)

但是20分钟9万首歌曲似乎并不完全出格。它需要对磁盘进行大量的随机访问,并且寻道速度很慢(10ms)。所以 90,000 个文件,每个文件一次查找已经是 15 分钟了。

If you feel up to it, you could dive into threading :) - It is quite easy in python.

Create two Queue objects, one for IN and one for OUT and in between multiple threads scan the files. (Worker threads, and Queue's are threadsafe)

It should be only ~80loc more code, and you probably can keep the current functions, just wrap em in appropriate classes. (to make "Thread" take it)

But 20min for 90k songs does not seem to be completely out of line. It requires a lot of random access on the disk, and seeking is slow (10ms). so 90,000files, one seek per file is 15min already.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文