如何生成这些URL REST参数
我正在与一个网站合作,从下面的URL中的REST参数下载公开免费数据。
- 在线DCR的URL [土地突变费用的收据类型]否: http:// xxx/peg/doc?data = 21192
- url for khatian [文档类型以确保正确的土地和标题]:https:// xxx/qr-vk/b59e7
- url dakhila [dakhila for Dakhila [收据类型用于支付收据土地税]: https:// yyyy/mvbdszi4utyzdusydfjlzgjwt0x2
现在我的问题是如何理解代码21192
,b59e78
和 mvbdszi4utyzi4utyzi4utyzi4utyzdusydfjjlzgjjlzgj99 第7和8位代码IE 21192
和B59E78
。
我计划使用Python请求模块从网站收集所有数据。现在,我正在使用蛮力获得方法,所有小写字母和数字为7和8位长度的排列,但需要数百万个请求,这需要耗时且效率较低。因此,我需要找到一种解码7和8位代码的方法。
我现在正在使用的蛮力代码 -
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random
asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]
for nmbr in numbers:
#numbers = [n for n in numbers if nmbr!=n]
codes_chunk_to_write = []
prmlist = numbers+lowercases
random.shuffle(prmlist)
idlist = itertools.permutations(prmlist,6)
#counter = 0
#lenght_of_permutation = sum(1 for ignore in idlist)
for id_ in idlist:
code = 'b'+nmbr+''.join(id_)
#print(code)
#code = 'b59e78'
url = 'https://XXX/qr-vk/'+code
pdfname = code+'.pdf'
response = requests.get(url)
if "@@@@@@" in response.text:
pdf_response = requests.get('https://XXX/qr-print/'+code)
with open(pdfname, 'wb') as f:
f.write(pdf_response.content)
print(f"Completed {code}")
不幸的是,这里发布的所有答案至少在某种程度上告诉我这些代码[21192,b59e78]在链接中毫无用处。所有代码只是一个随机的唯一标识符。通过各自的代码跟踪链接的响应以及因此,这是不可行的。但是从第一个开始,我总是告诉响应内容与代码之间存在链接。
我得到了什么。
- URL http:// xxx/peg/doc?data = 21192返回 pdf包含一个值
৪৭০৬
,英语等同于4706167
。十六进制21192
的十进制值是347061
。刚刚预先 通过一个数字3
是分区代码。 - URL https:// xxx/qr-vk/b59e7返回pdf 包含一个值
,该值等同于英语
47061
。十六进制B59E7
的小数为304706
。只是 由双数30
备份,并由一个数字1
附加 我正在寻找的。
我徘徊在此更新中,以为我稍后可能会提出一个完整的答案,因为21192
仅完全理解。 希望我认为还有更多需要观察到更多时间的事情,至少像我这样的菜鸟。
I am working with a site where from I can download publicly free data based on rest parameter in url as below.
- Url for Online DCR[type of receipt for land mutation fee payment] No:
http://XXX/pages/doc?data=21192 - Url for khatian[type of document to ensure right and title of land]: https://XXX/qr-vk/b59e7
- Url for dakhila[type of receipt for payment of land tax]:
https://YYYY/MVBDSzI4UTYzdUsydFJLZGJwT0x2
Now my question is how to understand the code 21192
, b59e78
and MVBDSzI4UTYzdUsydFJLZGJwT0x2Z9
- though I am interested more in the first 7 and 8 digit code i.e. 21192
and b59e78
.
I am planning to collect all the data from the site using python requests module. Now I am using brute force get method by all the permutation of lowercase letter and number with 7 and 8 digit length but it takes million of requests which is time consuming and less effective. So I need to find a way to decode the 7 and 8 digit codes.
Brute-force code I am using now-
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random
asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]
for nmbr in numbers:
#numbers = [n for n in numbers if nmbr!=n]
codes_chunk_to_write = []
prmlist = numbers+lowercases
random.shuffle(prmlist)
idlist = itertools.permutations(prmlist,6)
#counter = 0
#lenght_of_permutation = sum(1 for ignore in idlist)
for id_ in idlist:
code = 'b'+nmbr+''.join(id_)
#print(code)
#code = 'b59e78'
url = 'https://XXX/qr-vk/'+code
pdfname = code+'.pdf'
response = requests.get(url)
if "@@@@@@" in response.text:
pdf_response = requests.get('https://XXX/qr-print/'+code)
with open(pdfname, 'wb') as f:
f.write(pdf_response.content)
print(f"Completed {code}")
Unfortunately, all answers posted here tell, at least in a way, that these codes[21192,b59e78] are useless in terms of response got from the link. All codes are just a mere random unique identifier. It not feasible to track response from the link with respective code and so so. But from the first I always told there is a link between the response content and the code.
What I Have got.
- Url http://XXX/pages/doc?data=21192 returns
pdf contains a value৪৭০৬
which is English equivalent to4706167
. Decimal value of hex21192
is347061
. Just prepended
by a single digit3
which is division code. - Url https://XXX/qr-vk/b59e7 returns pdf
contains a value৪৭০৬১
which is English equivalent to47061
. Decimal value of hexb59e7
is304706
. Just
prepended by a double digit30
and appended by a single digit1
which I am searching for.
I lingered to post this update thinking I might come up with a full answer later since 21192
has been fully understood only.
Hopefully, I think there are more things to observe which demands more time, for noob like me at least.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
B59E78A7是3047061671的HEX,因此您最多迭代5B左右
b59e78a7 is hex for 3047061671 so you just iterate up to 5B or so and hex the numbers