使用 SQL INSERT 激活的记录迭代在第一组 INSERTS 后停止，但使用 PRINT 测试会遍历每条记录（根据需要）

发布于 2025-01-10 13:15:09 字数 1572 浏览 0 评论 0原文

Windows 10 上的 Python 3.10.2 和 sqlite3。对 Python 来说相对较新，但我在 1997-2005 年左右对 Perl 很有经验。

三天来我一直在为此苦苦挣扎。从多个角度进行攻击。没有成功。我正在寻求指导，因为在这一点上，如果没有帮助打开电灯开关，我认为自己无法取得进步。

我的整个应用程序利用 PRAW 来抓取 subreddit 并将提交内容放入表中。然后，我们一次一行地浏览该 reddit 表，扫描 selftext 列中是否有任何 URL。忽略一些。清理别人。将我们想要的插入到另一个表中。稍后，我将浏览该表，下载每个 URL。

我的问题是，如果我运行下面的代码，并将 INSERT 注释掉，我的 print(dl_lnk[0]) 行会打印出所有（当前）1,400 的结果来自 reddit 表的行。但是，如果我激活 INSERT 行，它似乎只处理表中的第一行。我可以假设情况确实如此，因为打印行仅显示几行，并且它们都涉及相同的用户和相同/相似的 URL。

我不明白这是为什么。我不认为这是因为 SQL 中的错误（尽管本质上，在代码中访问 SQL 异常的选项似乎比我以前在 Perl 中的要少）。但我也没有看到任何关于我的流逻辑的信息，当 SQL INSERT 发生时，它只处理一行，但当它被注释掉时，它会处理所有行。

def scanForDownloadLinks(db):
    cursorObj = db.cursor()
    # Get all records from 'reddit' submission table.
    for reddit_row in cursorObj.execute('''SELECT name, author, subreddit, permalink, title, selftext FROM reddit ORDER BY CREATED_UTC ASC'''):
        # Build URL-matching regex
        link_regex = re.compile('((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)', re.DOTALL)
        # Find all URLs in this reddit submission's selftext column.
        dl_links = re.findall(link_regex, reddit_row[5])
        # Go through each link, disregard links we don't want, clean up links we do, then insert them into the dl_link table
        for dl_lnk in dl_links:
            print(dl_lnk[0])
            cursorObj.execute('''INSERT INTO dl_ledger (name, title, dl_link) VALUES(?,?,?)''', (reddit_row[0], str(reddit_row[4]), dl_lnk[0]))
        db.commit()

原文

Python 3.10.2 with sqlite3 on Windows 10. Relatively new to Python, but I was quite experienced with Perl around 1997-2005.

Banged my head against this for three days. Attacked from lots of angles. No success. I'm asking for guidance, because I don't see myself progressing on my own at this point without help flipping on the light-switch.

My overall application utilizes PRAW to scrape a subreddit and put the Submissions into a table. Then, we go through that reddit table one row at a time, scanning the selftext column for any URLs. Disregard some. Clean up others. Insert the ones we want into another table. At a later point, I'll then go through that table, downloading each URL.

My problem is that if I run the below code, with the INSERT commented out, my print(dl_lnk[0]) line prints out the results of all (currently) 1,400 rows from the reddit table. But if I activate the INSERT line, it only seems to process the first row in the table. I can assume this to be the case, because the print line only shows a few lines and they are all regarding the same user and the same/similar URL.

I don't understand why this is. I don't think it's because of an error in SQL (though there seems to be fewer options for accessing SQL exceptions in-code than I used to have in Perl, inherently). But I also don't see anything about my flow logic that would make it process just one row when an SQL INSERT happens but processes through all of them when it's commented out.

def scanForDownloadLinks(db):
    cursorObj = db.cursor()
    # Get all records from 'reddit' submission table.
    for reddit_row in cursorObj.execute('''SELECT name, author, subreddit, permalink, title, selftext FROM reddit ORDER BY CREATED_UTC ASC'''):
        # Build URL-matching regex
        link_regex = re.compile('((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)', re.DOTALL)
        # Find all URLs in this reddit submission's selftext column.
        dl_links = re.findall(link_regex, reddit_row[5])
        # Go through each link, disregard links we don't want, clean up links we do, then insert them into the dl_link table
        for dl_lnk in dl_links:
            print(dl_lnk[0])
            cursorObj.execute('''INSERT INTO dl_ledger (name, title, dl_link) VALUES(?,?,?)''', (reddit_row[0], str(reddit_row[4]), dl_lnk[0]))
        db.commit()

分享到QQ

分享到微博