如何从已经索引的数据中解析基巴纳中的日志数据？

发布于 2025-02-09 14:24:01 字数 1523 浏览 0 评论 0原文

因此，我为一家创业公司工作，我们正在使用可容纳的弹性堆栈。我不确定这是否会影响任何东西，但我想我会提到它。我是唯一对弹性堆栈有任何工作知识的人，这不是很多。所以请忍受我。

基本上，来自Postgres DB和React应用程序的日志数据将发送到弹性堆栈。当我在基巴纳（Kibana）搜索它时，我会得到看起来像这样的结果：

{
  "_index": "logstash-2022.03.23",
  "_type": "_doc",
  "_id": "yboRtX8B3AbBxU682eag",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2022-03-23T04:38:04.706Z",
    "source": "app",
    "layer": "app",
    "container":xxxxxxxxx
    "app": "our-app",
    "host": xxxxxxx,
    "app_id": xxxxxxx,
    "log": "172.17.0.1 - - [22/Mar/2022:23:37:56 -0500] \"GET /healthcheck HTTP/1.0\" 400 26 \"-\" \"Aptible Health Check\"\n",
    "service": "xxxxxxx",
    "type": "json",
    "@version": "3",
    "file": "/tmp/dockerlogs/f76cd328d5710e817702c5b7c15d37797828a1308f8b0e17d039a86813237f73/f76cd328d5710e817702c5b7c15d37797828a1308f8b0e17d039a86813237f73-json.log",
    "offset": 46118576,
    "stream": "stdout",
    "time": "2022-03-23T04:37:56.192544365Z"
  },
  "fields": {
    "@timestamp": [
      "2022-03-23T04:38:04.706Z"
    ],
    "time": [
      "2022-03-23T04:37:56.192Z"
    ]
  },
  "sort": [
    1648010284706
  ]
}

这不是超级有用的，因为我发现有用的数据都在“日志”部分中。因此，这是上面的日志部分： “ 172.17.0.1--- [22/MAR/2022：23：37：56 -0500] \“ get/healthcheck http/1.0 \” 400 26 \“ - \” - \“ \” \“可恰当的健康检查\” \ n \ n \ n \ n \ n “

我想解析它，或者更好，或者拥有Kibana或任何自动解析的东西。我讨厌承认我已经花了几天的时间来解决这个问题，而当它首次解析数据中的数据或之后如何完成该数据时，我就无法弄清楚如何自动解析此问题；就像从此JSON中获取日志数据并解析它，以便它最终成为其自己的JSON。

因此，这里关于如何解析这将是很棒的任何建议。

原文

So I work for a startup and we are using Aptible to host our elastic stack. I'm not sure if that affects anything but figured I'd mention it. I'm the only guy here with any working knowledge of the elastic stack, and it's not very much. So bear with me, please.

Basically the log data from a postgres db and react app gets sent to the elastic stack. When I go to search through it in kibana, I get results that look like this:

{
  "_index": "logstash-2022.03.23",
  "_type": "_doc",
  "_id": "yboRtX8B3AbBxU682eag",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2022-03-23T04:38:04.706Z",
    "source": "app",
    "layer": "app",
    "container":xxxxxxxxx
    "app": "our-app",
    "host": xxxxxxx,
    "app_id": xxxxxxx,
    "log": "172.17.0.1 - - [22/Mar/2022:23:37:56 -0500] \"GET /healthcheck HTTP/1.0\" 400 26 \"-\" \"Aptible Health Check\"\n",
    "service": "xxxxxxx",
    "type": "json",
    "@version": "3",
    "file": "/tmp/dockerlogs/f76cd328d5710e817702c5b7c15d37797828a1308f8b0e17d039a86813237f73/f76cd328d5710e817702c5b7c15d37797828a1308f8b0e17d039a86813237f73-json.log",
    "offset": 46118576,
    "stream": "stdout",
    "time": "2022-03-23T04:37:56.192544365Z"
  },
  "fields": {
    "@timestamp": [
      "2022-03-23T04:38:04.706Z"
    ],
    "time": [
      "2022-03-23T04:37:56.192Z"
    ]
  },
  "sort": [
    1648010284706
  ]
}

This isn't super helpful, since the data I find useful is all in the "log" section.
So here is the log section from above :
"172.17.0.1 - - [22/Mar/2022:23:37:56 -0500] \"GET /healthcheck HTTP/1.0\" 400 26 \"-\" \"Aptible Health Check\"\n"

I'd like to parse that, or better yet have kibana or whatever automatically parse that.
I hate to admit I've spent DAYS on this problem, and I just can't figure out how to either have this parsed automatically when it first parses the data coming in, or how to have it done afterward; like take the log data from this JSON and parse it so it ends up as its own JSON.

So any advice here on how to parse that would be great.

分享到QQ

分享到微博