管道中的$匹配项会怎样?

发布于 2025-02-09 08:53:11 字数 2255 浏览 2 评论 0原文

我是MongoDB和Python脚本的新手。我很困惑$匹配术语在管道中处理。

假设我管理了一个库,在该图书馆中,在其中跟踪书籍中的JSON文件。每本书的副本都有一个JSON。 book.json文件看起来像这样:

{
    "Title": "A Tale of Two Cities",
    "subData":
        {
            "status": "Checked In"
            ...more data here...
        }
}

status将是有限字符串的一个字符串,也许只是:{{'''ined in in',“ nocked ofer”,“ nocked”,“丢失”,等等。 }但还请注意,可能没有状态字段:

{
    "Title": "Great Expectations",
    "subData":
        {
            ...more data here...
        }
}

好的:我试图在Python脚本中编写MongoDB管道,该脚本执行以下操作:

  • 对于库中的每本书:
    • 分组并计算状态的不同实例字段

,因此我的Python脚本中的目标输出将是这样的:

{ "A Tale of Two Cities"   'Checked In'    3 }
{ "A Tale of Two Cities"   'Checked Out'   4 }
{ "Great Expectations"     'Checked In'    5 }
{ "Great Expectations"     ''    7 }

这是我的代码:

mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2

listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
    match_variable = {
        "$match": { 'Title': book }
    }
    group_variable = {
        "$group":{
            '_id': '$subdata.status',
            'categories' : { '$addToSet' : '$subdata.status' },
            'count': { '$sum': 1 }
        }
    }
    project_variable = {
        "$project": {
            '_id': 0,
            'categories' : 1,
            'count' : 1
        }
    }
    pipeline = [
        match_variable,
        group_variable,
        project_variable
    ]
    results = mycollection.aggregate(pipeline)
    for result in results:
        print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))

您可能会说,我几乎不知道我是什么正在做。当我运行代码时,我会发现一个错误,因为我试图引用我的$ match术语:

Traceback (most recent call last):
  File "testScript.py", line 34, in main
    print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))
KeyError: 'Title'

那么$ match项不包含在管道中?或者我不是在group_variableproject_variable中包括它吗?

总的来说,上述似乎有很多代码可以做一些相对容易的事情。有人看到更好的方法吗?它很容易在线找到简单的示例,但这是与我可以找到的任何内容相比,这是复杂的一步。谢谢。

I'm a newbie to MongoDB and Python scripts. I'm confused how a $match term is handled in a pipeline.

Let's say I manage a library, where books are tracked as JSON files in a MongoDB. There is one JSON for each copy of a book. The book.JSON files look like this:

{
    "Title": "A Tale of Two Cities",
    "subData":
        {
            "status": "Checked In"
            ...more data here...
        }
}

Here, status will be one string from a finite set of strings, perhaps just: { "Checked In", "Checked Out", "Missing", etc. } But also note also that there may not be a status field at all:

{
    "Title": "Great Expectations",
    "subData":
        {
            ...more data here...
        }
}

Okay: I am trying to write a MongoDB pipeline within a Python script that does the following:

  • For each book in the library:
    • Groups and counts the different instances of the status field

So my target output from my Python script would be something like this:

{ "A Tale of Two Cities"   'Checked In'    3 }
{ "A Tale of Two Cities"   'Checked Out'   4 }
{ "Great Expectations"     'Checked In'    5 }
{ "Great Expectations"     ''    7 }

Here's my code:

mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2

listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
    match_variable = {
        "$match": { 'Title': book }
    }
    group_variable = {
        "$group":{
            '_id': '$subdata.status',
            'categories' : { '$addToSet' : '$subdata.status' },
            'count': { '$sum': 1 }
        }
    }
    project_variable = {
        "$project": {
            '_id': 0,
            'categories' : 1,
            'count' : 1
        }
    }
    pipeline = [
        match_variable,
        group_variable,
        project_variable
    ]
    results = mycollection.aggregate(pipeline)
    for result in results:
        print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))

As you can probably tell, I have very little idea what I'm doing. When I run the code, I get an error because I'm trying to reference my $match term:

Traceback (most recent call last):
  File "testScript.py", line 34, in main
    print(str(result['Title'])+"  "+str(result['categories'])+"  "+str(result['count']))
KeyError: 'Title'

So a $match term is not included in the pipeline? Or am I not including it in the group_variable or project_variable ?

And on a general note, the above seems like a lot of code to do something relatively easy. Does anyone see a better way? Its easy to find simple examples online, but this is one step of complexity away from anything I can locate. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

舞袖。长 2025-02-16 08:53:11

这是一个聚合管道到“ $ group” ast “ title”“ subdata .status“

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "Title": "$Title",
        "status": {"$ifNull": ["$subData.status", ""]}
      },
      "count": { "$count": {} }
    }
  },
  { // not really necessary, but puts output in predictable order
    "$sort": {
      "_id.Title": 1,
      "_id.status": 1
    }
  },
  {
    "$replaceWith": {
      "$mergeObjects": [
        "$_id",
        {"count": "$count"}
      ]
    }
  }
])

其中一个“书籍”的示例输出:

  {
    "Title": "mumblecore",
    "count": 3,
    "status": ""
  },
  {
    "Title": "mumblecore",
    "count": 3,
    "status": "Checked In"
  },
  {
    "Title": "mumblecore",
    "count": 8,
    "status": "Checked Out"
  },
  {
    "Title": "mumblecore",
    "count": 6,
    "status": "Missing"
  }

Here's one aggregation pipeline to "$group" all the books by "Title" and "subData.status".

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "Title": "$Title",
        "status": {"$ifNull": ["$subData.status", ""]}
      },
      "count": { "$count": {} }
    }
  },
  { // not really necessary, but puts output in predictable order
    "$sort": {
      "_id.Title": 1,
      "_id.status": 1
    }
  },
  {
    "$replaceWith": {
      "$mergeObjects": [
        "$_id",
        {"count": "$count"}
      ]
    }
  }
])

Example output for one of the "books":

  {
    "Title": "mumblecore",
    "count": 3,
    "status": ""
  },
  {
    "Title": "mumblecore",
    "count": 3,
    "status": "Checked In"
  },
  {
    "Title": "mumblecore",
    "count": 8,
    "status": "Checked Out"
  },
  {
    "Title": "mumblecore",
    "count": 6,
    "status": "Missing"
  }

Try it on mongoplayground.net.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文