管道中的$匹配项会怎样？

发布于 2025-02-09 08:53:11 字数 2255 浏览 2 评论 0原文

我是MongoDB和Python脚本的新手。我很困惑$匹配术语在管道中处理。

假设我管理了一个库，在该图书馆中，在其中跟踪书籍中的JSON文件。每本书的副本都有一个JSON。 book.json文件看起来像这样：

{
    "Title": "A Tale of Two Cities",
    "subData":
        {
            "status": "Checked In"
            ...more data here...
        }
}

status将是有限字符串的一个字符串，也许只是：{{'''ined in in'，“ nocked ofer”，“ nocked”，“丢失”，等等。 }但还请注意，可能没有状态字段：

{
    "Title": "Great Expectations",
    "subData":
        {
            ...more data here...
        }
}

好的：我试图在Python脚本中编写MongoDB管道，该脚本执行以下操作：

对于库中的每本书：
- 分组并计算状态的不同实例字段

，因此我的Python脚本中的目标输出将是这样的：

{ "A Tale of Two Cities"   'Checked In'    3 }
{ "A Tale of Two Cities"   'Checked Out'   4 }
{ "Great Expectations"     'Checked In'    5 }
{ "Great Expectations"     ''    7 }

这是我的代码：

mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2

listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
    match_variable = {
        "$match": { 'Title': book }
    }
    group_variable = {
        "$group":{
            '_id': '$subdata.status',
            'categories' : { '$addToSet' : '$subdata.status' },
            'count': { '$sum': 1 }
        }
    }
    project_variable = {
        "$project": {
            '_id': 0,
            'categories' : 1,
            'count' : 1
        }
    }
    pipeline = [
        match_variable,
        group_variable,
        project_variable
    ]
    results = mycollection.aggregate(pipeline)
    for result in results:
        print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))

您可能会说，我几乎不知道我是什么正在做。当我运行代码时，我会发现一个错误，因为我试图引用我的$ match术语：

Traceback (most recent call last):
  File "testScript.py", line 34, in main
    print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))
KeyError: 'Title'

那么$ match项不包含在管道中？或者我不是在group_variable或project_variable中包括它吗？

总的来说，上述似乎有很多代码可以做一些相对容易的事情。有人看到更好的方法吗？它很容易在线找到简单的示例，但这是与我可以找到的任何内容相比，这是复杂的一步。谢谢。

原文

I'm a newbie to MongoDB and Python scripts. I'm confused how a $match term is handled in a pipeline.

Let's say I manage a library, where books are tracked as JSON files in a MongoDB. There is one JSON for each copy of a book. The book.JSON files look like this:

{
    "Title": "A Tale of Two Cities",
    "subData":
        {
            "status": "Checked In"
            ...more data here...
        }
}

Here, status will be one string from a finite set of strings, perhaps just: { "Checked In", "Checked Out", "Missing", etc. } But also note also that there may not be a status field at all:

{
    "Title": "Great Expectations",
    "subData":
        {
            ...more data here...
        }
}

Okay: I am trying to write a MongoDB pipeline within a Python script that does the following:

For each book in the library:
- Groups and counts the different instances of the status field

So my target output from my Python script would be something like this:

{ "A Tale of Two Cities"   'Checked In'    3 }
{ "A Tale of Two Cities"   'Checked Out'   4 }
{ "Great Expectations"     'Checked In'    5 }
{ "Great Expectations"     ''    7 }

Here's my code:

mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2

listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
    match_variable = {
        "$match": { 'Title': book }
    }
    group_variable = {
        "$group":{
            '_id': '$subdata.status',
            'categories' : { '$addToSet' : '$subdata.status' },
            'count': { '$sum': 1 }
        }
    }
    project_variable = {
        "$project": {
            '_id': 0,
            'categories' : 1,
            'count' : 1
        }
    }
    pipeline = [
        match_variable,
        group_variable,
        project_variable
    ]
    results = mycollection.aggregate(pipeline)
    for result in results:
        print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))

As you can probably tell, I have very little idea what I'm doing. When I run the code, I get an error because I'm trying to reference my $match term:

Traceback (most recent call last):
  File "testScript.py", line 34, in main
    print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))
KeyError: 'Title'

So a $match term is not included in the pipeline? Or am I not including it in the group_variable or project_variable ?

And on a general note, the above seems like a lot of code to do something relatively easy. Does anyone see a better way? Its easy to find simple examples online, but this is one step of complexity away from anything I can locate. Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

舞袖。长 2025-02-16 08:53:11

这是一个聚合管道到“ $ group” ast “ title”和“ subdata .status“。

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "Title": "$Title",
        "status": {"$ifNull": ["$subData.status", ""]}
      },
      "count": { "$count": {} }
    }
  },
  { // not really necessary, but puts output in predictable order
    "$sort": {
      "_id.Title": 1,
      "_id.status": 1
    }
  },
  {
    "$replaceWith": {
      "$mergeObjects": [
        "$_id",
        {"count": "$count"}
      ]
    }
  }
])

其中一个“书籍”的示例输出：

  {
    "Title": "mumblecore",
    "count": 3,
    "status": ""
  },
  {
    "Title": "mumblecore",
    "count": 3,
    "status": "Checked In"
  },
  {
    "Title": "mumblecore",
    "count": 8,
    "status": "Checked Out"
  },
  {
    "Title": "mumblecore",
    "count": 6,
    "status": "Missing"
  }

在。

Here's one aggregation pipeline to "$group" all the books by "Title" and "subData.status".

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "Title": "$Title",
        "status": {"$ifNull": ["$subData.status", ""]}
      },
      "count": { "$count": {} }
    }
  },
  { // not really necessary, but puts output in predictable order
    "$sort": {
      "_id.Title": 1,
      "_id.status": 1
    }
  },
  {
    "$replaceWith": {
      "$mergeObjects": [
        "$_id",
        {"count": "$count"}
      ]
    }
  }
])

Example output for one of the "books":

  {
    "Title": "mumblecore",
    "count": 3,
    "status": ""
  },
  {
    "Title": "mumblecore",
    "count": 3,
    "status": "Checked In"
  },
  {
    "Title": "mumblecore",
    "count": 8,
    "status": "Checked Out"
  },
  {
    "Title": "mumblecore",
    "count": 6,
    "status": "Missing"
  }

Try it on mongoplayground.net.

回复收藏 0 原文

~没有更多了~