使用 Bash 获取 GitHub 组织中所有分支的列表而不触发速率限制?

发布于 2025-01-10 13:58:42 字数 2887 浏览 0 评论 0原文

在尝试建立传入 GitHub 提交的列表时,我偶然发现了 GitHub 速率 api 限制,即每小时 60 次调用。正如这个答案中所解释的,可以使用 API 调用获取分支列表:

https://api.github.com/repos/{username}/{repo-name}/branches

但是,这会触发速率限制对于普通 GitHub 组织/用户而言。所以我想尝试一种不同的方法,使用 RSS/atom 格式。然而,正如同一个答案所解释的那样,原子格式/RSS 提要似乎取决于用户拥有存储库中所有分支的列表。 这个问题要求了解存储库中所有提交的概述,但相反,它会给出存储库默认分支中所有提交的答案。 这个问题收到了一个有效的答案触发速率限制,因为它依赖于每个存储库至少 1 个 API 调用。

因此,我想问:如何最多使用 1 个 GitHub API 调用来获取 GitHub 用户的所有分支的列表?

注意,使用原子视图是完全可以的,但是,我尚未找到类似以下的原子视图: https://github.com/:owner/:repo/commits.atomhttps://github.com/:owner/:repo/ branches.atom 显示存储库中的所有分支。我强烈希望有一个不依赖第三方的解决方案,例如: https://rsshub.app/ github/repos/yanglr 正如我想象的那样,它们也会在某个时候开始速率限制。

我当前的方法是使用 bash 抓取 https://github.com/:user/:repo/branches 的源代码。但是,我想可能存在更有效的解决方案。

MWE

感谢这些评论,我能够找到一个 bash MWE 来使用终端执行 GraphQL 查询。它在这个答案中给出,其中bearer不是一个变量,它是标识和 ...... 应该是您的个人 GitHub 访问令牌。我目前正在研究如何使存储库超过第一百个。然后我将了解如何获取这些存储库的分支。

尝试 I

以下查询生成一个 json,其中包含用户每个存储库中的存储库和前 4 个分支!

名称:examplequery.gql

query {
  repositoryOwner(login: "somegithubuser") {
    repositories(first: 40) {
      edges {
        node {
          nameWithOwner
          refs(
            refPrefix: "refs/heads/"
            orderBy: { direction: DESC, field: TAG_COMMIT_DATE }
            first: 4
          ) {
            edges {
              node {
                ... on Ref {
                  name
                }
              }
            }
          }
        }
      }
    }
  }
}

接下来,制作一个运行查询的 bash 脚本:

#!/usr/bin/env bash
# Runs graphql query on GitHub. Execute with:
# ./run_graphql_query.sh examplequery1.gql

GITHUB_PERSONAL_ACCESS_TOKEN_GLOBAL="your_github_personal_access_token"

if [ $# -ne 1 ]; then
    echo "usage of this script is incorrect."
    exit 1
fi

if [ ! -f $1 ];then
    echo "usage of this script is incorrect."
    exit 1
fi

# Form query JSON
QUERY=$(jq -n \
           --arg q "$(cat $1 | tr -d '\n')" \
           '{ query: $q }')


curl -s -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: bearer $GITHUB_PERSONAL_ACCESS_TOKEN_GLOBAL" \
  --data "$QUERY" \
  https://api.github.com/graphql

它可以运行:

./run_graphql_query.sh examplequery1.gql

在我回答这个问题之前,还有两个问题需要解决。如何迭代所有存储库而不是仅前 100 个存储库。如何将 json 解析为每个存储库的分支列表。

While trying to establish a list of incoming GitHub commits I've stumbled accross the GitHub rate api limits, of 60 calls per hour. As explained in this answer, one can get the lists of branches with an API call using:

https://api.github.com/repos/{username}/{repo-name}/branches

However, that triggers the rate limit for the average GitHub organisation/user. So I thought I'd try a different approach, using RSS/atom format. However, as that same answer explains, the atom format/rss feed seems to depend on the user having a list of all branches in a repository. This question asks for an overview of all commits in a repository, yet instead it is given an answer for all commits in the default branch of the repository. And this question receives a working answer that triggers the rate limit, as it relies on at least 1 API call per repository.

Hence, I would like to ask: How could one get a list of all branches of a GitHub user, using at most 1 GitHub API call?

Note, using atom views would be perfectly fine, however, I have not found an atom view like: https://github.com/:owner/:repo/commits.atom or https://github.com/:owner/:repo/branches.atom that displays all branches in a repository. I would strongly prefer a solution that does not rely on a third party like: https://rsshub.app/github/repos/yanglr as I imagine, they too will at some point start rate-limiting.

My current approach is to scrape the source code of https://github.com/:user/:repo/branches using bash. However, I imagine there might exist a more efficient solution to this.

MWE

Thanks to the comments, I was ble to find a bash MWE to perform a GraphQL query using terminal. It is given in this answer, where bearer is not a variable, it is the means of identification and the ...... should be your personal GitHub Access token. I am currently looking into how to get the repositories beyond the 1st hundred. Then I'll look at how to get the branches of those repositories.

Attempt I

The following query yields a json with the repositories and first 4 branches in each repository of a user!

name:examplequery.gql.

query {
  repositoryOwner(login: "somegithubuser") {
    repositories(first: 40) {
      edges {
        node {
          nameWithOwner
          refs(
            refPrefix: "refs/heads/"
            orderBy: { direction: DESC, field: TAG_COMMIT_DATE }
            first: 4
          ) {
            edges {
              node {
                ... on Ref {
                  name
                }
              }
            }
          }
        }
      }
    }
  }
}

Next, a bash script is made that runs the query:

#!/usr/bin/env bash
# Runs graphql query on GitHub. Execute with:
# ./run_graphql_query.sh examplequery1.gql

GITHUB_PERSONAL_ACCESS_TOKEN_GLOBAL="your_github_personal_access_token"

if [ $# -ne 1 ]; then
    echo "usage of this script is incorrect."
    exit 1
fi

if [ ! -f $1 ];then
    echo "usage of this script is incorrect."
    exit 1
fi

# Form query JSON
QUERY=$(jq -n \
           --arg q "$(cat $1 | tr -d '\n')" \
           '{ query: $q }')


curl -s -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: bearer $GITHUB_PERSONAL_ACCESS_TOKEN_GLOBAL" \
  --data "$QUERY" \
  https://api.github.com/graphql

It can be ran with:

./run_graphql_query.sh examplequery1.gql

There are two more issues to resolve before I can answer the question. How I can iterate over all repositories instead of only the first 100. How I can parse the json into a list of branches per repository.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文