如何生成这些URL REST参数
我正在与一个网站合作,从下面的URL中的REST参数下载公开免费数据。
- 在线DCR的URL [土地突变费用的收据类型]否: http:// xxx/peg/doc?data = 21192
- url for khatian [文档类型以确保正确的土地和标题]:https:// xxx/qr-vk/b59e7
- url dakhila [dakhila for Dakhila [收据类型用于支付收据土地税]: https:// yyyy/mvbdszi4utyzdusydfjlzgjwt0x2
现在我的问题是如何理解代码21192
,b59e78
和 mvbdszi4utyzi4utyzi4utyzi4utyzdusydfjjlzgjjlzgj99 第7和8位代码IE 21192
和B59E78
。
我计划使用Python请求模块从网站收集所有数据。现在,我正在使用蛮力获得方法,所有小写字母和数字为7和8位长度的排列,但需要数百万个请求,这需要耗时且效率较低。因此,我需要找到一种解码7和8位代码的方法。
我现在正在使用的蛮力代码 -
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random
asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]
for nmbr in numbers:
#numbers = [n for n in numbers if nmbr!=n]
codes_chunk_to_write = []
prmlist = numbers+lowercases
random.shuffle(prmlist)
idlist = itertools.permutations(prmlist,6)
#counter = 0
#lenght_of_permutation = sum(1 for ignore in idlist)
for id_ in idlist:
code = 'b'+nmbr+''.join(id_)
#print(code)
#code = 'b59e78'
url = 'https://XXX/qr-vk/'+code
pdfname = code+'.pdf'
response = requests.get(url)
if "@@@@@@" in response.text:
pdf_response = requests.get('https://XXX/qr-print/'+code)
with open(pdfname, 'wb') as f:
f.write(pdf_response.content)
print(f"Completed {code}")
不幸的是,这里发布的所有答案至少在某种程度上告诉我这些代码[21192,b59e78]在链接中毫无用处。所有代码只是一个随机的唯一标识符。通过各自的代码跟踪链接的响应以及因此,这是不可行的。但是从第一个开始,我总是告诉响应内容与代码之间存在链接。
我得到了什么。
- URL http:// xxx/peg/doc?data = 21192返回 pdf包含一个值
৪৭০৬
,英语等同于4706167
。十六进制21192
的十进制值是347061
。刚刚预先 通过一个数字3
是分区代码。 - URL https:// xxx/qr-vk/b59e7返回pdf 包含一个值
,该值等同于英语
47061
。十六进制B59E7
的小数为304706
。只是 由双数30
备份,并由一个数字1
附加 我正在寻找的。
我徘徊在此更新中,以为我稍后可能会提出一个完整的答案,因为21192
仅完全理解。 希望我认为还有更多需要观察到更多时间的事情,至少像我这样的菜鸟。
I am working with a site where from I can download publicly free data based on rest parameter in url as below.
- Url for Online DCR[type of receipt for land mutation fee payment] No:
http://XXX/pages/doc?data=21192 - Url for khatian[type of document to ensure right and title of land]: https://XXX/qr-vk/b59e7
- Url for dakhila[type of receipt for payment of land tax]:
https://YYYY/MVBDSzI4UTYzdUsydFJLZGJwT0x2
Now my question is how to understand the code 21192
, b59e78
and MVBDSzI4UTYzdUsydFJLZGJwT0x2Z9
- though I am interested more in the first 7 and 8 digit code i.e. 21192
and b59e78
.
I am planning to collect all the data from the site using python requests module. Now I am using brute force get method by all the permutation of lowercase letter and number with 7 and 8 digit length but it takes million of requests which is time consuming and less effective. So I need to find a way to decode the 7 and 8 digit codes.
Brute-force code I am using now-
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import requests
import string
import itertools
import PyPDF2
from pathlib import Path
import os
import sys
import random
asciiletters =list(string.ascii_letters)
lowercases =list(string.ascii_lowercase)[:]
numbers = list(map(str,list(range(0,10))))[:]
for nmbr in numbers:
#numbers = [n for n in numbers if nmbr!=n]
codes_chunk_to_write = []
prmlist = numbers+lowercases
random.shuffle(prmlist)
idlist = itertools.permutations(prmlist,6)
#counter = 0
#lenght_of_permutation = sum(1 for ignore in idlist)
for id_ in idlist:
code = 'b'+nmbr+''.join(id_)
#print(code)
#code = 'b59e78'
url = 'https://XXX/qr-vk/'+code
pdfname = code+'.pdf'
response = requests.get(url)
if "@@@@@@" in response.text:
pdf_response = requests.get('https://XXX/qr-print/'+code)
with open(pdfname, 'wb') as f:
f.write(pdf_response.content)
print(f"Completed {code}")
Unfortunately, all answers posted here tell, at least in a way, that these codes[21192,b59e78] are useless in terms of response got from the link. All codes are just a mere random unique identifier. It not feasible to track response from the link with respective code and so so. But from the first I always told there is a link between the response content and the code.
What I Have got.
- Url http://XXX/pages/doc?data=21192 returns
pdf contains a value৪৭০৬
which is English equivalent to4706167
. Decimal value of hex21192
is347061
. Just prepended
by a single digit3
which is division code. - Url https://XXX/qr-vk/b59e7 returns pdf
contains a value৪৭০৬১
which is English equivalent to47061
. Decimal value of hexb59e7
is304706
. Just
prepended by a double digit30
and appended by a single digit1
which I am searching for.
I lingered to post this update thinking I might come up with a full answer later since 21192
has been fully understood only.
Hopefully, I think there are more things to observe which demands more time, for noob like me at least.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
B59E78A7是3047061671的HEX,因此您最多迭代5B左右
b59e78a7 is hex for 3047061671 so you just iterate up to 5B or so and hex the numbers