如何读取Gitlab中的问题总数并将其存储在pandas/python中？

发布于 2025-01-11 02:42:40 字数 484 浏览 0 评论 0原文

我需要通过url从Gitlab中提取所有问题，但它每页只能给出100条记录。有没有办法读取所有页面的所有记录？对于 ex-Gitlab 可能有 5 个页面，总共 550 条记录。但我只能获取第 1 页、第 2 页、第 3 页等等......只有 100 条记录。我希望通过该单个 url 阅读所有 550 期。

url = "https://gitlab.com/api/v4/projects/00000000/issues?page=1&per_page=100&labels=xyz"

payload={}
headers = {'Authorization': 'Bearer XXXX-XXXXXXXXX'}

response = requests.request("GET", url, headers=headers, data=payload)
df = pd.read_json(io.StringIO(response.text))

有人可以帮我吗？

原文

I need to extract all the issues out of Gitlab through url, but it can only give 100 records per page. Is there any way to read all the records from all the pages? For ex- Gitlab may have 5 pages, with total 550 records. But I can only get 1st page, or 2nd page, or 3rd page so on... with 100 records only. I want all 550 issues to be read with that single url.

url = "https://gitlab.com/api/v4/projects/00000000/issues?page=1&per_page=100&labels=xyz"

payload={}
headers = {'Authorization': 'Bearer XXXX-XXXXXXXXX'}

response = requests.request("GET", url, headers=headers, data=payload)
df = pd.read_json(io.StringIO(response.text))

Could anyone please help me out?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小巷里的女流氓 2025-01-18 02:42:40

您需要发出多个请求，但这可以通过 while 循环轻松完成。它的工作原理是使用先前响应标头中提供的链接发出新请求，然后将其附加到原始（以及任何先前）响应内容。

根据 API 文档，问题包含在列表中，但情况可能并非如此，因此您可能需要一些额外的逻辑并进行检查，但这应该按原样工作：

url = "https://gitlab.com/api/v4/projects/4339844/issues"
payload= {}
headers = {}

content = []
link = url

while link:
    r = requests.get(link, headers=headers, data=payload)
    if r.status_code == 200:
        content += r.json()
    try:
        link = r.links['next']['url']
        print(link)
    except:
        link = ''

df = pd.DataFrame(content)
print(df)

You need to make multiple requests, but it's easily done with a while loop. It works by using the link provided in the previous response header to make a new request before appending it to the original (and any previous) response content.

According to the API docs, the Issues are contained in a list, but it might not be the case, so you probably need need some extra logic and checks in there, but this should work as is:

url = "https://gitlab.com/api/v4/projects/4339844/issues"
payload= {}
headers = {}

content = []
link = url

while link:
    r = requests.get(link, headers=headers, data=payload)
    if r.status_code == 200:
        content += r.json()
    try:
        link = r.links['next']['url']
        print(link)
    except:
        link = ''

df = pd.DataFrame(content)
print(df)

回复收藏 0 原文

~没有更多了~