在熊猫中的groupby与其中的子句获得计数
我有带有日期字段、已创建(user_accout) 和user_id 的数据表。
日期 | 创建 | customer_id |
---|---|---|
2022-01-01 | 2021-05-07 | user1。 |
2022-01-02 | 2022-01-02 | 使用2。 |
2022-01-03 | 2021-02-02 | 使用3。 |
2022-01-04 | 2022-01-05 | 使用4。 |
2022-01-05 | 2022-01-05 | 使用5。 |
2022-01-06 | 2022-01-08 | 使用6。 |
user1分组的新用户的计数(作为基于创建的字段的新用户)
我想获得按日期字段 | 创建的 | customer_id | new_users(基于日期'colum1) |
---|---|---|---|
2022-01-01 | 2021-05-07 | 。 | 0 |
2022-01-02 | 2022-01-02 | 使用2。 | 1 |
2022-01-03 | 2021-02-02 | 使用3。 | 0 |
2022-01-04 | 2022-01-05 | 使用4。 | 0 |
2022-01-05 | 2022-01-05 | 使用5。 | 2 |
2022-01-06 | 2022-01-08 | 使用6。 | 0 |
我尝试使用 groupby 但无法分配创建的日期 == 来获取特定日期字段上新用户的计数。
I have data table with date field, created(user_accout) and user_id.
date | created | customer_id |
---|---|---|
2022-01-01 | 2021-05-07 | user1. |
2022-01-02 | 2022-01-02 | use2. |
2022-01-03 | 2021-02-02 | use3. |
2022-01-04 | 2022-01-05 | use4. |
2022-01-05 | 2022-01-05 | use5. |
2022-01-06 | 2022-01-08 | use6. |
I want to get a count (as new users based on the created field)of new users grouped by the date field
date | created | customer_id | new_users(based on date'colum1) |
---|---|---|---|
2022-01-01 | 2021-05-07 | user1. | 0 |
2022-01-02 | 2022-01-02 | use2. | 1 |
2022-01-03 | 2021-02-02 | use3. | 0 |
2022-01-04 | 2022-01-05 | use4. | 0 |
2022-01-05 | 2022-01-05 | use5. | 2 |
2022-01-06 | 2022-01-08 | use6. | 0 |
i tried using the groupby but i could not able to assign date == created to get count of the new users on particular date field.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

首先,我认为最好将您的数据分为两个不同的表。在第一表中,您只有
creation_date
s和customer_id
s。这是您的实际输入。看起来像这样:我对它进行了一些更改,以使其更具说明性。
据我所知,您要计算每个日期存在多少唯一
customer_id
。这可以使用groupby
和nunique
来完成。现在,您可能想通过一系列连续的日期加入此结果。首先,让我们创建一个具有这样的日期的索引:
现在,让我们使用此新索引来重新索引我们的系列
cesulter_created
:取决于您需要的内容,系列或数据框架,您可以放下最后一部分(即
。 to_frame()
等)。这可能就是您要寻找的。在您的问题中,该表与原始表合并,但我认为这没有多大意义,因为初始表和新表之间没有关系。
customer_id
中的所有值也可能是唯一的。然后,您可以将create_table.groupby('创建')['customer_id'] .nunique()
用create_table ['content']。value_counts()
。First of all, I think it is better to split your data into two different tables. In the first table, you have only
creation_date
s andcustomer_id
s. This is your actual input. It looks like this:I changed it a little bit to make it more illustrative.
Now as far as I understood, you want to count how many unique
customer_id
s exist for each date. This can be done withgroupby
andnunique
.Now you probably want to join this results with a series of consecutive dates. First, let's create an index with such dates:
Now let's reindex our series
customers_created
with this new index:Depending on what you need, series or dataframe, you may drop the last part (i.e.
.to_frame()
, etc).This is probably what you were looking for. In your question, this table was merged with the original table, but I don't think it makes much sense, as there are no relation between rows of the initial table and the new table.
It is also possible that all values in
customer_id
are guaranteed to be unique. Then you can replacecreated_table.groupby('created')['customer_id'].nunique()
withcreated_table['created'].value_counts()
.