如何生成这些URL REST参数

发布于 2025-02-12 04:53:35 字数 2343 浏览 0 评论 0原文

我正在与一个网站合作,从下面的URL中的REST参数下载公开免费数据。

  1. 在线DCR的URL [土地突变费用的收据类型]否: http:// xxx/peg/doc?data = 21192
  2. url for khatian [文档类型以确保正确的土地和标题]:https:// xxx/qr-vk/b59e7
  3. url dakhila [dakhila for Dakhila [收据类型用于支付收据土地税]: https:// yyyy/mvbdszi4utyzdusydfjlzgjwt0x2

现在我的问题是如何理解代码21192b59e78和 mvbdszi4utyzi4utyzi4utyzi4utyzdusydfjjlzgjjlzgj99 第7和8位代码IE 21192B59E78

我计划使用Python请求模块从网站收集所有数据。现在,我正在使用蛮力获得方法,所有小写字母和数字为7和8位长度的排列,但需要数百万个请求,这需要耗时且效率较低。因此,我需要找到一种解码7和8位代码的方法。

我现在正在使用的蛮力代码 -

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-

import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random


asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]


for nmbr in numbers:
    #numbers = [n for n in numbers if nmbr!=n]
    codes_chunk_to_write = []
    prmlist = numbers+lowercases
    random.shuffle(prmlist)
    idlist = itertools.permutations(prmlist,6)
    #counter = 0
    #lenght_of_permutation = sum(1 for ignore in idlist)
    
    for id_ in idlist:
        code = 'b'+nmbr+''.join(id_)       
        #print(code)
    
        #code = 'b59e78'
        url = 'https://XXX/qr-vk/'+code
        pdfname = code+'.pdf'
        response = requests.get(url)
        
        if "@@@@@@" in response.text:
            pdf_response = requests.get('https://XXX/qr-print/'+code)
            with open(pdfname, 'wb') as f:
                f.write(pdf_response.content)
            
        print(f"Completed {code}")

不幸的是,这里发布的所有答案至少在某种程度上告诉我这些代码[21192,b59e78]在链接中毫无用处。所有代码只是一个随机的唯一标识符。通过各自的代码跟踪链接的响应以及因此,这是不可行的。但是从第一个开始,我总是告诉响应内容与代码之间存在链接。

我得到了什么。


  1. URL http:// xxx/peg/doc?data = 21192返回 pdf包含一个值৪৭০৬,英语等同于 4706167。十六进制21192的十进制值是347061。刚刚预先 通过一个数字3是分区代码。
  2. URL https:// xxx/qr-vk/b59e7返回pdf 包含一个值,该值等同于英语 47061。十六进制B59E7的小数为304706。只是 由双数30备份,并由一个数字1附加 我正在寻找的。

我徘徊在此更新中,以为我稍后可能会提出一个完整的答案,因为21192仅完全理解。 希望我认为还有更多需要观察到更多时间的事情,至少像我这样的菜鸟。

I am working with a site where from I can download publicly free data based on rest parameter in url as below.

  1. Url for Online DCR[type of receipt for land mutation fee payment] No:
    http://XXX/pages/doc?data=21192
  2. Url for khatian[type of document to ensure right and title of land]: https://XXX/qr-vk/b59e7
  3. Url for dakhila[type of receipt for payment of land tax]:
    https://YYYY/MVBDSzI4UTYzdUsydFJLZGJwT0x2

Now my question is how to understand the code 21192 , b59e78 and MVBDSzI4UTYzdUsydFJLZGJwT0x2Z9 - though I am interested more in the first 7 and 8 digit code i.e. 21192 and b59e78.

I am planning to collect all the data from the site using python requests module. Now I am using brute force get method by all the permutation of lowercase letter and number with 7 and 8 digit length but it takes million of requests which is time consuming and less effective. So I need to find a way to decode the 7 and 8 digit codes.

Brute-force code I am using now-

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-

import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random


asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]


for nmbr in numbers:
    #numbers = [n for n in numbers if nmbr!=n]
    codes_chunk_to_write = []
    prmlist = numbers+lowercases
    random.shuffle(prmlist)
    idlist = itertools.permutations(prmlist,6)
    #counter = 0
    #lenght_of_permutation = sum(1 for ignore in idlist)
    
    for id_ in idlist:
        code = 'b'+nmbr+''.join(id_)       
        #print(code)
    
        #code = 'b59e78'
        url = 'https://XXX/qr-vk/'+code
        pdfname = code+'.pdf'
        response = requests.get(url)
        
        if "@@@@@@" in response.text:
            pdf_response = requests.get('https://XXX/qr-print/'+code)
            with open(pdfname, 'wb') as f:
                f.write(pdf_response.content)
            
        print(f"Completed {code}")

Unfortunately, all answers posted here tell, at least in a way, that these codes[21192,b59e78] are useless in terms of response got from the link. All codes are just a mere random unique identifier. It not feasible to track response from the link with respective code and so so. But from the first I always told there is a link between the response content and the code.

What I Have got.


  1. Url http://XXX/pages/doc?data=21192 returns
    pdf contains a value ৪৭০৬ which is English equivalent to
    4706167. Decimal value of hex 21192 is 347061. Just prepended
    by a single digit 3 which is division code.
  2. Url https://XXX/qr-vk/b59e7 returns pdf
    contains a value ৪৭০৬১ which is English equivalent to
    47061. Decimal value of hex b59e7 is 304706. Just
    prepended by a double digit 30 and appended by a single digit 1
    which I am searching for.

I lingered to post this update thinking I might come up with a full answer later since 21192 has been fully understood only.
Hopefully, I think there are more things to observe which demands more time, for noob like me at least.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

南渊 2025-02-19 04:53:35

B59E78A7是3047061671的HEX,因此您最多迭代5B左右

b59e78a7 is hex for 3047061671 so you just iterate up to 5B or so and hex the numbers

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文