如何生成这些URL REST参数

发布于 2025-02-12 04:53:35 字数 2343 浏览 0 评论 0原文

我正在与一个网站合作，从下面的URL中的REST参数下载公开免费数据。

在线DCR的URL [土地突变费用的收据类型]否： http：// xxx/peg/doc？data = 21192
url for khatian [文档类型以确保正确的土地和标题]：https：// xxx/qr-vk/b59e7
url dakhila [dakhila for Dakhila [收据类型用于支付收据土地税]： https：// yyyy/mvbdszi4utyzdusydfjlzgjwt0x2

现在我的问题是如何理解代码21192，b59e78和 mvbdszi4utyzi4utyzi4utyzi4utyzdusydfjjlzgjjlzgj99 第7和8位代码IE 21192和B59E78。

我计划使用Python请求模块从网站收集所有数据。现在，我正在使用蛮力获得方法，所有小写字母和数字为7和8位长度的排列，但需要数百万个请求，这需要耗时且效率较低。因此，我需要找到一种解码7和8位代码的方法。

我现在正在使用的蛮力代码 -

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-

import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random


asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]


for nmbr in numbers:
    #numbers = [n for n in numbers if nmbr!=n]
    codes_chunk_to_write = []
    prmlist = numbers+lowercases
    random.shuffle(prmlist)
    idlist = itertools.permutations(prmlist,6)
    #counter = 0
    #lenght_of_permutation = sum(1 for ignore in idlist)
    
    for id_ in idlist:
        code = 'b'+nmbr+''.join(id_)       
        #print(code)
    
        #code = 'b59e78'
        url = 'https://XXX/qr-vk/'+code
        pdfname = code+'.pdf'
        response = requests.get(url)
        
        if "@@@@@@" in response.text:
            pdf_response = requests.get('https://XXX/qr-print/'+code)
            with open(pdfname, 'wb') as f:
                f.write(pdf_response.content)
            
        print(f"Completed {code}")

不幸的是，这里发布的所有答案至少在某种程度上告诉我这些代码[21192，b59e78]在链接中毫无用处。所有代码只是一个随机的唯一标识符。通过各自的代码跟踪链接的响应以及因此，这是不可行的。但是从第一个开始，我总是告诉响应内容与代码之间存在链接。

我得到了什么。

URL http：// xxx/peg/doc？data = 21192返回 pdf包含一个值৪৭০৬，英语等同于 4706167。十六进制21192的十进制值是347061。刚刚预先通过一个数字3是分区代码。
URL https：// xxx/qr-vk/b59e7返回pdf 包含一个值，该值等同于英语 47061。十六进制B59E7的小数为304706。只是由双数30备份，并由一个数字1附加我正在寻找的。

我徘徊在此更新中，以为我稍后可能会提出一个完整的答案，因为21192仅完全理解。希望我认为还有更多需要观察到更多时间的事情，至少像我这样的菜鸟。

原文

I am working with a site where from I can download publicly free data based on rest parameter in url as below.

Url for Online DCR[type of receipt for land mutation fee payment] No:
http://XXX/pages/doc?data=21192
Url for khatian[type of document to ensure right and title of land]: https://XXX/qr-vk/b59e7
Url for dakhila[type of receipt for payment of land tax]:
https://YYYY/MVBDSzI4UTYzdUsydFJLZGJwT0x2

Now my question is how to understand the code 21192 , b59e78 and MVBDSzI4UTYzdUsydFJLZGJwT0x2Z9 - though I am interested more in the first 7 and 8 digit code i.e. 21192 and b59e78.

I am planning to collect all the data from the site using python requests module. Now I am using brute force get method by all the permutation of lowercase letter and number with 7 and 8 digit length but it takes million of requests which is time consuming and less effective. So I need to find a way to decode the 7 and 8 digit codes.

Brute-force code I am using now-

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-

import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random


asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]


for nmbr in numbers:
    #numbers = [n for n in numbers if nmbr!=n]
    codes_chunk_to_write = []
    prmlist = numbers+lowercases
    random.shuffle(prmlist)
    idlist = itertools.permutations(prmlist,6)
    #counter = 0
    #lenght_of_permutation = sum(1 for ignore in idlist)
    
    for id_ in idlist:
        code = 'b'+nmbr+''.join(id_)       
        #print(code)
    
        #code = 'b59e78'
        url = 'https://XXX/qr-vk/'+code
        pdfname = code+'.pdf'
        response = requests.get(url)
        
        if "@@@@@@" in response.text:
            pdf_response = requests.get('https://XXX/qr-print/'+code)
            with open(pdfname, 'wb') as f:
                f.write(pdf_response.content)
            
        print(f"Completed {code}")

Unfortunately, all answers posted here tell, at least in a way, that these codes[21192,b59e78] are useless in terms of response got from the link. All codes are just a mere random unique identifier. It not feasible to track response from the link with respective code and so so. But from the first I always told there is a link between the response content and the code.

What I Have got.

Url http://XXX/pages/doc?data=21192 returns
pdf contains a value ৪৭০৬ which is English equivalent to
4706167. Decimal value of hex 21192 is 347061. Just prepended
by a single digit 3 which is division code.
Url https://XXX/qr-vk/b59e7 returns pdf
contains a value ৪৭০৬১ which is English equivalent to
47061. Decimal value of hex b59e7 is 304706. Just
prepended by a double digit 30 and appended by a single digit 1
which I am searching for.

I lingered to post this update thinking I might come up with a full answer later since 21192 has been fully understood only.
Hopefully, I think there are more things to observe which demands more time, for noob like me at least.

分享到QQ

分享到微博