比较上一个记录的循环概括算法?
我有一个数据集,我可以通过这个词典列表的玩具示例来表示:
data = [{
"_id" : "001",
"Location" : "NY",
"start_date" : "2022-01-01T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "002",
"Location" : "NY",
"start_date" : "2022-01-02T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "011",
"Location" : "NY",
"start_date" : "2022-02-01T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "012",
"Location" : "NY",
"Start_Date" : "2022-02-02T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "101",
"Location" : "NY",
"Start_Date" : "2022-03-01T00:00:00Z",
"Baz" : "pizza"
},
{
"_id" : "102",
"Location" : "NY",
"Start_Date" : "2022-03-2T00:00:00Z",
"Baz" : "pizza"
},
]
这是Python中的算法,该算法收集每个“集合”中的每个键输出。
data_keys = []
for i, lst in enumerate(data):
all_keys = []
for k, v in lst.items():
all_keys.append(k)
if k.lower() == 'start_date':
start_date = v
this_coll = {'start_date': start_date, 'all_keys': all_keys}
if i == 0:
data_keys.append(this_coll)
else:
last_coll = data_keys[-1]
if this_coll['all_keys'] == last_coll['all_keys']:
continue
else:
data_keys.append(this_coll)
此处给出的正确输出记录了每个字段名称的更改:foo
,bar
,baz
以及字段中的案例更改start_date
to start_date
:
[{'start_date': '2022-01-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Foo']},
{'start_date': '2022-02-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Bar']},
{'start_date': '2022-02-02T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Bar']},
{'start_date': '2022-03-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Baz']}]
是否有一般算法涵盖此模式将这种模式比较与堆栈中的先前项目进行比较?
我需要概括该算法并找到一个解决方案,可以使用集合中的MongoDB文档进行完全相同的操作。为了让我发现Mongo是否有我可以使用的聚合管道运算符,我必须首先理解此基本算法是否具有其他常见形式,因此我知道要寻找什么。
或知道MongoDB聚合管道的人真的可以建议运营商会产生所需的结果?
I have a data set which I can represent by this toy example of a list of dictionaries:
data = [{
"_id" : "001",
"Location" : "NY",
"start_date" : "2022-01-01T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "002",
"Location" : "NY",
"start_date" : "2022-01-02T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "011",
"Location" : "NY",
"start_date" : "2022-02-01T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "012",
"Location" : "NY",
"Start_Date" : "2022-02-02T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "101",
"Location" : "NY",
"Start_Date" : "2022-03-01T00:00:00Z",
"Baz" : "pizza"
},
{
"_id" : "102",
"Location" : "NY",
"Start_Date" : "2022-03-2T00:00:00Z",
"Baz" : "pizza"
},
]
Here is an algorithm in Python which collects each of the keys in each 'collection' and whenever there is a key change, the algorithm adds those keys to output.
data_keys = []
for i, lst in enumerate(data):
all_keys = []
for k, v in lst.items():
all_keys.append(k)
if k.lower() == 'start_date':
start_date = v
this_coll = {'start_date': start_date, 'all_keys': all_keys}
if i == 0:
data_keys.append(this_coll)
else:
last_coll = data_keys[-1]
if this_coll['all_keys'] == last_coll['all_keys']:
continue
else:
data_keys.append(this_coll)
The correct output given here records each change of field name: Foo
, Bar
, Baz
as well as the change of case in field start_date
to Start_Date
:
[{'start_date': '2022-01-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Foo']},
{'start_date': '2022-02-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Bar']},
{'start_date': '2022-02-02T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Bar']},
{'start_date': '2022-03-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Baz']}]
Is there a general algorithm which covers this pattern comparing current to previous item in a stack?
I need to generalize this algorithm and find a solution to do exactly the same thing with MongoDB documents in a collection. In order for me to discover if Mongo has an Aggregation Pipeline Operator which I could use, I must first understand if this basic algorithm has other common forms so I know what to look for.
Or someone who knows MongoDB aggregation pipelines really well could suggest operators which would produce the desired result?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
edit :如果要为此使用查询,一个选项是:
$ objectToArray
允许将密钥格式化为值,而$ ifnull
允许检查start_date
的几个选项。$ undind
允许我们对密钥进行排序。$ group
允许我们撤消$ undind
,但是现在使用排序的键$降低
从所有键创建字符串,因此我们' LL可以比较一些。Playground示例
EDIT: If you want to use a query for this, one option is something like:
$objectToArray
allow to format the keys as values, and the$ifNull
allows to check several options ofstart_date
.$unwind
allows us to sort the keys.$group
allow us to undo the$unwind
, but now with sorted keys$reduce
to create a string from all keys, so we'll have something to compare.Playground example
itertools.groupbyby
当键值更改时迭代子列表。它可以为您跟踪更改键的工作。就您而言,这就是字典的键。您可以创建一个列表理解,该列表从这些子术语中的每个子词中获取第一个值。结果
itertools.groupby
iterates subiterators when a key value has changed. It does the work of tracking a changing key for you. In your case, that's the keys of the dictionary. You can create a list comprehension that takes the first value from each of these subiterators.Result