logstash同步mysql的数据到elasticsearch时重复

发布于 2021-12-04 06:03:43 字数 2785 浏览 847 评论 1

在用logstash将mysql数据同步到es的时候出现了数据重复的问题不知道该怎么去重了,其中attachments/tag/types这三个属性都是数组
大家帮忙看一下:

其中logstash启动文件如下:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash-5.6.2/config/mysql-connector-java-5.1.45.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true&createDatabaseIfNotExist=true"
    jdbc_user => "root"
    jdbc_password => "root"
    jdbc_default_timezone => "Asia/Shanghai"
    jdbc_paging_enabled => true
    jdbc_page_size => 100000
    jdbc_fetch_size => 10000
    connection_retry_attempts => 3
    connection_retry_attempts_wait_time => 1
    jdbc_pool_timeout => 5
    lowercase_column_names => true
    record_last_run => true
    schedule => "* * * * *"
    use_column_value => true
    tracking_column => "id"
    statement_filepath => "/usr/share/logstash-5.6.2/config/knowledge_all.sql"
  }
}
filter {
  aggregate {
    task_id => "%{id}"
    code => "
      map['id'] = event.get('id')
      map['title'] = event.get('title')
      map['attachments'] ||= 
      map['attachments'] << {
        'id' => event.get('attachment_id'),
        'filename' => event.get('attachment_filename'),
        'path' => event.get('attachment_path')
      }
      map['types'] ||= 
      map['types'] << {
        'value' => event.get('type_value'),
        'label' => event.get('type_label')
      }
      map['tag'] ||= 
      map['tag'] << {
        'id' => event.get('tag_id'),
        'title' => event.get('tag_title')
      }
      event.cancel()
    "
    push_previous_map_as_event => true
  }

 }

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test"
    document_type => "knowledge"
    document_id => "%{id}"
  }
}

 

knowledge_all.sql:

 

SELECT
DISTINCT
  k.id,
  k.title,
  a.id as attachment_id,
  a.filename as attachment_filename,
  a.path as attachment_path,
  t.id          AS type_value,
  t.title       AS type_label,
  ta.id as tag_id,
  ta.title as tag_title
FROM t_knowledge k
  LEFT JOIN t_knowledge_attachment ka ON ka.knowledge_id = k.id
  LEFT JOIN t_attachment a ON ka.attachment_id = a.id
  LEFT JOIN t_knowledge_type_relate tr ON tr.knowledge_id = k.id
  LEFT JOIN t_knowledge_type t ON t.id = tr.type_id
  LEFT JOIN t_knowledge_tag_relate tar ON tar.knowledge_id = k.id
  LEFT JOIN t_knowledge_tag ta ON ta.id = tar.tag_id

 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

自此以后,行同陌路 2021-12-08 15:11:40

最后改用java bulk方式同步了

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文