使用文件管理的最大递归深度

发布于 2025-01-24 00:40:40 字数 1233 浏览 0 评论 0原文

我必须创建一个搜索引擎,该引擎将搜索包含文本文件的目录(文件夹)中的特定单词。

例如,假设我们正在搜索一个称为X的目录中的“机器”一词。我要实现的是扫描X及其子目录内的所有TXT文件。

调用Python对象时,我将获得最大的递归深度。

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)] += 1
            return occurences
                
        else :

            return searchEngine (k, word)

I have to create a search engine, that will search for a specific word inside a directory (folder) that contains text files.

For example, assume that we are searching for the word "machine" in a certain directory called X. What I want to achieve is to scan all the txt files inside X and its subdirectories as well.

I am getting maximum recursion depth exceeded while calling a Python object.

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)] += 1
            return occurences
                
        else :

            return searchEngine (k, word)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

晚风撩人 2025-01-31 00:40:40

几点:

  • 运行代码时我无法重建递归错误。但是我认为您在这里有问题:list = os.listdir(path) - 这仅给您 evely <>相对 file/pathnames,但是以下需要 absolute absolute < /em>一个(例如Open),一旦您超出了cwd
  • 我认为返回语句放错了位置:它在 first txt-file之后返回?
  • python提供了递归递程走路的现成解决方案: () glob。 glob() and path.rglob() :为什么不使用它们?
  • path.absolute()未记录,我不会使用它。您可以使用 path.resolve.resolve.resolve.resolve()
  • 您在递归步骤中返回的事件什么都不做:我认为您在检索它后应该更新主词典吗?
  • 不要使用list作为变量名称 - 您正在覆盖对内置list()的访问。

这是path.rglob()的建议:

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count
    return occurences

如果您想自己实现递归,那么您可以做类似的事情:

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count            
    return occurences

A couple of points:

  • I couldn't reconstruct the recursion error when running your code. But I think you have a problem here: list = os.listdir(path) - this gives you only relative file/pathnames, but the following requires absolute ones (for example the open) once you're outside your cwd?
  • I think the return statement is misplaced: it returns after the first txt-file?
  • Python offers readymade solutions for walking through paths recursively: os.walk(), glob.glob() and Path.rglob(): Why don't you use them?
  • Path.absolute() isn't documented, I wouldn't use it. You could use Path.resolve() instead?
  • You do nothing with the returned occurences in the recursion step: I think you should update the main dictionary after retrieving it?
  • Don't use list as a variable name - you're overriding access to the built-in list().

Here's a suggestion with Path.rglob():

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count
    return occurences

If you want to implement the recursion for yourself, then you could do something like:

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count            
    return occurences
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文