如何在 For 循环内的字典中插入数据

发布于 2025-01-18 13:03:00 字数 4045 浏览 0 评论 0原文

我被困在这部分了。我正在使用 PRAW 从 reddit 中提取数据,我需要将提取的所有数据推送到字典中,然后将字典数据存储到 PostgreSQL 数据库中,for 循环工作并提取我需要的所有值,但在end 仅将最后一个插入到字典中。我尝试使用列表字典,但相同的值重复了几次。如何将所有数据插入我的字典中?还测试了我在这里找到的其他解决方案,但出现了错误。 这是我的代码:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        itemB = {}
        con = Conexion()
        for submission in top_subreddit:
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                itemB["id"] = reditor.id
                print("id: " + itemB["id"])
                itemB["name"] = submission.fullname
                #print("name: " + itemB["name"])
                itemB["username"] = submission.author.name
                #print("username: " + itemB["username"])
                itemB["red"] = 13
                #print("red: " + str(itemB["red"]))
                itemB["type"] = "b"
                #print("type: " + str(itemB["type"]))
                itemB["karma"] = submission.author.total_karma
                #print("karma: " + str(itemB["karma"]))
                itemB["avatar"] = reditor.icon_img
                #print("url icon username: " + itemB["avatar"])
                itemB["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                #print("extract date: " + itemB["extract_date"])
                itemB["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                #print("created at: " + str(itemB["created_at"]))
            except:
                print("No se hallo ID del usuario, se omite el post")

打印只是为了评估 PRAW 是否正确提取数据。

PS:我使用 PRAW 7.5.0 和 Pyhton 3.8 以及 PyCharm。

我尝试使用列表来存储每个键的值,然后使用列表来创建字典,但只是得到了重复几次的相同值。 此外,尝试创建另一个用于存储键和存储值,但缺少许多值。 我想要这样的东西: {'id':'kshdh''jajsjs''kasjs''asmjs'...,'name':'asrat''omes',...} 然后,从该字典中,将 PostgreSQL 数据库中的值(值)插入到每列(键)中。

桌子: 我实际上得到了这样的字典: {'id': 'ajsgs,jhfhd,ajddg,ahsgys,...','name':'maaa,nnn,...',...} 但最大的问题是所有值都是字符串,我需要 'red' 和 'karma' 为整数,并且不能在字典中将它们强制转换一次。 我在 PostgreSQL 中的表是这样的:

CREATE TABLE IF NOT EXISTS public.salert_basic
(
    id character varying(255) COLLATE pg_catalog."default" NOT NULL,
    name character varying(255) COLLATE pg_catalog."default",
    username character varying(255) COLLATE pg_catalog."default",
    red integer,
    extract_date timestamp without time zone,
    created_at timestamp without time zone,
    karma integer,
    icon character varying COLLATE pg_catalog."default",
    type character varying COLLATE pg_catalog."default",
    CONSTRAINT salert_basic_pk PRIMARY KEY (id)
)

从 Pyhton 插入数据的代码是这样的:

        Conexion.con.autocommit = True
        curser = Conexion.cursor
        columns = itemB.keys()
        for i in itemB.values():
           sql = '''insert into salert_basic(id,name,username,red,type,karma,icon,extraction_date,created_at) values{};'''.format(i)
        curser.execute(sql)
        Conexion.con.commit()
        Conexion.con.close()

这就是我创建字典的方式:

itemB = defaultdict(list)

然后。我为每个键填充它:

itemB["name"].append(submission.fullname)

最后,为了连接字典中列表的值,我使用它:

  for key in itemB:
     itemB[key] = ", ".join(itemB[key])

但正如我所说,要做到这一点,我将整数转换为字符串,它不能放入我的数据库。 你怎么说? PS:如何避免重复主键错误?因为有一些重复的 id。

更新:

  1. 我检查了 %s 的使用情况,但我忘记了。
  2. 嗯...不,我需要“id”键中的所有ID,但每个ID都与其他ID分开,不像ahsgdshjgjsdgs.....,而且,id是PK,所以不允许重复,但我认为使用在冲突中,在 sql 中不执行任何操作,我可以避免插入它并继续执行其他操作。
  3. 是的,我尝试将每个提交作为数据库表中的一行插入,但这让我头疼。

I'm stuck in this part. I'm extracting data from reddit using PRAW, and I need to push all the data I extract into a dictionary and then, store the dict data into a PostgreSQL database, the for-loop works and extracts all the values I need but at the end only the last one is inserted in the dict. I tried using a dict of lists, but the same values are repeated several times. How can I insert all the data in my dict?. Also tested other solutions I found here, but just got an error.
Here's my code:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        itemB = {}
        con = Conexion()
        for submission in top_subreddit:
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                itemB["id"] = reditor.id
                print("id: " + itemB["id"])
                itemB["name"] = submission.fullname
                #print("name: " + itemB["name"])
                itemB["username"] = submission.author.name
                #print("username: " + itemB["username"])
                itemB["red"] = 13
                #print("red: " + str(itemB["red"]))
                itemB["type"] = "b"
                #print("type: " + str(itemB["type"]))
                itemB["karma"] = submission.author.total_karma
                #print("karma: " + str(itemB["karma"]))
                itemB["avatar"] = reditor.icon_img
                #print("url icon username: " + itemB["avatar"])
                itemB["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                #print("extract date: " + itemB["extract_date"])
                itemB["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                #print("created at: " + str(itemB["created_at"]))
            except:
                print("No se hallo ID del usuario, se omite el post")

The prints are just to evaluate that PRAW extracts the data correctly.

PS: I use PRAW 7.5.0 and Pyhton 3.8 with PyCharm.

I tried using lists to store each key's value and then using the lists to create the dictionary, but just got the same values repeating several times.
Also, tried to create another for to store keys and store values, but many values were missing.
I want so have something like this:
{'id':'kshdh''jajsjs''kasjs''asmjs'...,'name':'asrat''omes',...}
And then, from that dictionary, insert in each column (key) the values (value) in a PostgreSQL database.

TABLE:
I actually got a dict like this:
{'id': 'ajsgs,jhfhd,ajddg,ahsgys,...','name':'maaa,nnn,...',...} but the BIG problem with that is all values are string and I need 'red' and 'karma' to be integers, and can't cast them once in the dict.
My table in PostgreSQL is something like this:

CREATE TABLE IF NOT EXISTS public.salert_basic
(
    id character varying(255) COLLATE pg_catalog."default" NOT NULL,
    name character varying(255) COLLATE pg_catalog."default",
    username character varying(255) COLLATE pg_catalog."default",
    red integer,
    extract_date timestamp without time zone,
    created_at timestamp without time zone,
    karma integer,
    icon character varying COLLATE pg_catalog."default",
    type character varying COLLATE pg_catalog."default",
    CONSTRAINT salert_basic_pk PRIMARY KEY (id)
)

And the code to insert data from Pyhton is this:

        Conexion.con.autocommit = True
        curser = Conexion.cursor
        columns = itemB.keys()
        for i in itemB.values():
           sql = '''insert into salert_basic(id,name,username,red,type,karma,icon,extraction_date,created_at) values{};'''.format(i)
        curser.execute(sql)
        Conexion.con.commit()
        Conexion.con.close()

This is how I created my dict:

itemB = defaultdict(list)

Then. I fill it with this for each key:

itemB["name"].append(submission.fullname)

And finally, to concatenate the values of the list in the dict, I use this for:

  for key in itemB:
     itemB[key] = ", ".join(itemB[key])

But as I said, to do this, I cast my integers to strings, which can't put into my databse.
What do you say?
PS: How avoid duplicate primary key error? 'Cause there are some repeated ids.

UPDATE:

  1. I checked the use of %s, I forgot about it.
  2. Well... no, I need all the ids in "id" key but each one separated from the others, not like ahsgdshjgjsdgs....., also, id is a PK, so duplicates are not allowed but I think with a IN CONFLICT DO NOTHING in the sql I can avoid its insertion and continue with the others.
  3. Yeah, I try to insert each submission as a row in the database table, but it's giving me headaches.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

想你的星星会说话 2025-01-25 13:03:00

仍然不是您要实现的目标。这是我认为要做的事情的尝试:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        data_list = []
        con = Conexion()
        for submission in top_subreddit:
            item_dict = {}
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                item_dict["id"] = reditor.id
                item_dict["name"] = submission.fullname
                item_dict["username"] = submission.author.name
                item_dict["red"] = 13
                item_dict["type"] = "b"
                item_dict["karma"] = submission.author.total_karma
                item_dict["avatar"] = reditor.icon_img
                item_dict["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                item_dict["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                data_list.append(item_dict)
            except:
                print("No se hallo ID del usuario, se omite el post")

sql = """insert into salert_basic
    (id, name, username, red, type, karma, icon,
    extraction_date, created_at) 
values
    (%(id)s, %(name)s,  %(username)s, %(red)s, %(type)s, %(karma)s, 
    %(icon)s, %(extraction_date)s, %(created_at)s)"""

curser = Conexion.cursor
curser.executemany(sql, data_list)

--If this is a large data set then it will perform better with

from psycopg2.extras import execute_batch

execute_batch(curser, sql, data_list)

以上:

  1. 创建dicts的列表
  2. 修改sql以使用名为占位符,因此dict中的值可以映射到占位符。
  3. executemany()或execute_batch()`中运行sql。他们将迭代列表,并将每个dict中的值应用于查询字符串中的占位符。

Still not exactly what you are trying to achieve. Here is a attempt at something that I think does what you want:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        data_list = []
        con = Conexion()
        for submission in top_subreddit:
            item_dict = {}
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                item_dict["id"] = reditor.id
                item_dict["name"] = submission.fullname
                item_dict["username"] = submission.author.name
                item_dict["red"] = 13
                item_dict["type"] = "b"
                item_dict["karma"] = submission.author.total_karma
                item_dict["avatar"] = reditor.icon_img
                item_dict["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                item_dict["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                data_list.append(item_dict)
            except:
                print("No se hallo ID del usuario, se omite el post")

sql = """insert into salert_basic
    (id, name, username, red, type, karma, icon,
    extraction_date, created_at) 
values
    (%(id)s, %(name)s,  %(username)s, %(red)s, %(type)s, %(karma)s, 
    %(icon)s, %(extraction_date)s, %(created_at)s)"""

curser = Conexion.cursor
curser.executemany(sql, data_list)

--If this is a large data set then it will perform better with

from psycopg2.extras import execute_batch

execute_batch(curser, sql, data_list)

The above:

  1. Creates a list of dicts
  2. Modifies sql to use named placeholders so the values in the dict can be mapped to a placeholder.
  3. Runs the sql in either executemany() or execute_batch()`. They will iterate over the list and apply the values in each dict to the placeholders in the query string.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文