使用文件管理的最大递归深度

发布于 2025-01-24 00:40:40 字数 1233 浏览 0 评论 0原文

我必须创建一个搜索引擎，该引擎将搜索包含文本文件的目录（文件夹）中的特定单词。

例如，假设我们正在搜索一个称为X的目录中的“机器”一词。我要实现的是扫描X及其子目录内的所有TXT文件。

调用Python对象时，我将获得最大的递归深度。

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)] += 1
            return occurences
                
        else :

            return searchEngine (k, word)

原文

I have to create a search engine, that will search for a specific word inside a directory (folder) that contains text files.

For example, assume that we are searching for the word "machine" in a certain directory called X. What I want to achieve is to scan all the txt files inside X and its subdirectories as well.

I am getting maximum recursion depth exceeded while calling a Python object.

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)] += 1
            return occurences
                
        else :

            return searchEngine (k, word)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晚风撩人 2025-01-31 00:40:40

几点：

运行代码时我无法重建递归错误。但是我认为您在这里有问题：list = os.listdir（path） - 这仅给您 evely <>相对 file/pathnames，但是以下需要 absolute absolute < /em>一个（例如Open），一旦您超出了cwd？
我认为返回语句放错了位置：它在 first txt-file之后返回？
python提供了递归递程走路的现成解决方案：（）， glob。 glob（） and path.rglob（） ：为什么不使用它们？
path.absolute（）未记录，我不会使用它。您可以使用 path.resolve.resolve.resolve.resolve（） ？
您在递归步骤中返回的事件什么都不做：我认为您在检索它后应该更新主词典吗？
不要使用list作为变量名称 - 您正在覆盖对内置list（）的访问。

这是path.rglob（）的建议：

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count
    return occurences

如果您想自己实现递归，那么您可以做类似的事情：

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count            
    return occurences

A couple of points:

I couldn't reconstruct the recursion error when running your code. But I think you have a problem here: list = os.listdir(path) - this gives you only relative file/pathnames, but the following requires absolute ones (for example the open) once you're outside your cwd?
I think the return statement is misplaced: it returns after the first txt-file?
Python offers readymade solutions for walking through paths recursively: os.walk(), glob.glob() and Path.rglob(): Why don't you use them?
Path.absolute() isn't documented, I wouldn't use it. You could use Path.resolve() instead?
You do nothing with the returned occurences in the recursion step: I think you should update the main dictionary after retrieving it?
Don't use list as a variable name - you're overriding access to the built-in list().

Here's a suggestion with Path.rglob():

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count
    return occurences

If you want to implement the recursion for yourself, then you could do something like:

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count            
    return occurences

回复收藏 0 原文

~没有更多了~