NOSQL DynamoDB排序和分区密钥

发布于 2025-02-02 23:31:24 字数 1161 浏览 2 评论 0原文

我创建了一个DynamoDB服务器，即NOSQL。

我的桌子就是这样：

+----------------------------------------------------------------------------+
|    product_id     section product_name  partition_key             sort_key |
+----------------------------------------------------------------------------+
| 0         100  electronic        mouse              0  2022-05-28 15:02:13 |
| 1         200  electronic    key board              1  2022-05-28 15:02:13 |
| 2         300        cats   cat feeder              2  2022-05-28 15:02:13 |
| 3         400        cats  cat drinker              3  2022-05-28 15:02:13 |
| 4         500        food        pizza              4  2022-05-28 15:02:13 |
+----------------------------------------------------------------------------+

这是我在熊猫中所做的：

df['partition_key'] = df.index + len(len_tbl)

sql_format = '%Y-%m-%d %H:%M:%S'
add_date = datetime.now(pytz.timezone('America/Sao_Paulo')).strftime(sql_format)
df['sort_key'] = add_date  # creationDate

我的疑问与该桌子形状的最有效的构建partition_key和sort_key的方法有关，我将每月插入500k〜1m+行，一段时间后，它会变得巨大。

从来没有与NoSQL合作，只有SQL。

你能和我分享一些提示吗？

原文

I have created a dynamodb server, which is nosql.

My table is like this:

+----------------------------------------------------------------------------+
|    product_id     section product_name  partition_key             sort_key |
+----------------------------------------------------------------------------+
| 0         100  electronic        mouse              0  2022-05-28 15:02:13 |
| 1         200  electronic    key board              1  2022-05-28 15:02:13 |
| 2         300        cats   cat feeder              2  2022-05-28 15:02:13 |
| 3         400        cats  cat drinker              3  2022-05-28 15:02:13 |
| 4         500        food        pizza              4  2022-05-28 15:02:13 |
+----------------------------------------------------------------------------+

This is what I did in pandas:

df['partition_key'] = df.index + len(len_tbl)

sql_format = '%Y-%m-%d %H:%M:%S'
add_date = datetime.now(pytz.timezone('America/Sao_Paulo')).strftime(sql_format)
df['sort_key'] = add_date  # creationDate

My doubt is related to the most efficient way to build either partition_key and sort_key due the shape of this table, and I'm gonna insert like 500k~1m+ lines per month, so its gonna be huge after some time.

Never worked with nosql, just sql.

Can you share some hints with me?

分享到QQ

分享到微博