在 JSON-LD 文件中过滤具有特定 JSON 值的键

发布于 2025-01-11 10:45:35 字数 1509 浏览 0 评论 0原文

我有一个 zip 文件（GZ），解压后每行都包含 JSON。下面是一个 JSON 行示例。我正在尝试使用 jq 将特定字段仅提取到 CSV 文件。我想提取这些字段，条件是 type 键应仅具有值 dissertation。

{
  "id": "https://openalex.org/W2777209504",
  "doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
  "display_name": "Hyperandrogenism as a factor of reproductive losses",
  "title": "Hyperandrogenism as a factor of reproductive losses",
  "publication_year": 2013, 
  "publication_date": "2013-03-27",
  "ids": {
    "openalex": "https://openalex.org/W2777209504",
    "doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
    "mag": 2777209504
  },
  "type": "journal-article",
  "counts_by_year": [
    {
      "year": 2019,
      "cited_by_count": 1
    }
  ],
  "cited_by_api_url": "https://api.openalex.org/works?filter=cites:W2777209504",
  "updated_date": "2021-11-03",
  "created_date": "2018-01-05",
  "abstract_inverted_index": {}
}

我尝试了以下两个命令，但它们都不起作用： \

gzcat -c example.gz | jq -rc '[.doi,.title,.publication_year,.publication_date,.type] | select(.type |contains("论文")) | @csv'>target.csv
gzcat -c example.gz | jq -rc '[.doi,.title,.publication_year,.publication_date,.type] | select(.type=="论文") | @csv'>target.csv

两者收到的输出均为：
jq: error (at:108753): Cannot index string with string "title"

我尝试了所有可能的方法来过滤我的 JSON-LD 文件，但无法成功。任何指示都会有很大帮助。

原文

I have a zip file(GZ) which when unzipped contains JSON in each line. Below is one sample JSON line. I am trying to extract specific fields only to CSV file using jq. I want to extract these fields with a condition that the type key should have the value dissertation only.

{
  "id": "https://openalex.org/W2777209504",
  "doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
  "display_name": "Hyperandrogenism as a factor of reproductive losses",
  "title": "Hyperandrogenism as a factor of reproductive losses",
  "publication_year": 2013, 
  "publication_date": "2013-03-27",
  "ids": {
    "openalex": "https://openalex.org/W2777209504",
    "doi": "https://doi.org/10.24026/1818-1384.1(42).2013.77470",
    "mag": 2777209504
  },
  "type": "journal-article",
  "counts_by_year": [
    {
      "year": 2019,
      "cited_by_count": 1
    }
  ],
  "cited_by_api_url": "https://api.openalex.org/works?filter=cites:W2777209504",
  "updated_date": "2021-11-03",
  "created_date": "2018-01-05",
  "abstract_inverted_index": {}
}

I tried the below two commands and neither of them worked: \

gzcat -c sample.gz | jq -rc '[.doi,.title, .publication_year, .publication_date, .type] | select(.type |contains("dissertation")) | @csv'>target.csv
gzcat -c sample.gz | jq -rc '[.doi,.title, .publication_year, .publication_date, .type] | select(.type=="dissertation") | @csv'>target.csv

The output received for both of them is:
jq: error (at <stdin>:108753): Cannot index string with string "title"

I tried all possibles ways to filter down my JSON-LD file but I am unable to succeed. Any pointers will be of great help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜味拾荒者 2025-01-18 10:45:35

在您的两次尝试中，select 的表述不正确（或位于错误的位置，具体取决于您的观点）。这会起作用：

select(.type == "dissertation")
| [.doi,.title, .publication_year, .publication_date, .type]
| @csv

In both your attempts, the select is incorrectly formulated (or in the wrong place, depending on your point of view). This would work:

select(.type == "dissertation")
| [.doi,.title, .publication_year, .publication_date, .type]
| @csv

回复收藏 0 原文

~没有更多了~

关于作者

吖咩

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

在 JSON-LD 文件中过滤具有特定 JSON 值的键

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

在 JSON-LD 文件中过滤具有特定 JSON 值的键

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。