网上找工作很乏味。帮我自动化它
许多招聘网站的搜索功能不完整,无法让您按经验水平缩小工作范围。即使他们这样做了,通常也是错误的。这需要你费力地浏览数百个你无法申请的帖子才能找到相关的帖子,相当乏味。由于我宁愿专注于写求职信等,所以我想编写一个程序来浏览大量帖子,并仅保存那些不需要多年经验的工作的 URL。
我不需要帮助编写抓取工具来获取可能相关的工作职位的 html 正文。问题在于准确检测该工作所需的经验水平。这应该不会太困难,因为工作岗位通常对此非常明确(“必须在……方面拥有 5 年经验”),但过于简单的解决方案可能会出现一些问题。
就我而言,我正在寻找入门级职位。他们通常不会说“入门级”,但包含这个词可能意味着应该保住这份工作。
接下来,我可以安全地排除一份声称需要“5 年”经验的工作,因此像 /\d\syears/ 这样的正则表达式似乎可以合理地排除工作。但后来,我意识到有些工作说需要 0-2 年的经验,与排除正则表达式匹配,但显然是我想看看的工作。嗯,我可以用另一个正则表达式来处理这个问题。但也有人说“不到2年”或者“不到2年”。也可以处理这个问题,但这让我想知道还有哪些其他模式是我没有想到的,并且可能排除了许多工作。这就是我来这里的原因,寻找一种比正则表达式更好的方法(如果有的话)。
我想最大限度地减少假阴性率,并保留所有看起来可能不需要多年经验的工作。排除与 /[3-9]\syears|1\d\syears/ 匹配的任何内容似乎合理吗?或者有更好的方法吗?也许训练贝叶斯过滤器?
编辑:有一个类似但更难的问题,解决它可能更有用。有很多工作只需要“工程学位”,因为你只需要了解一些技术知识。但搜索“工程”会给你带来数千个工作机会,而且大部分都是无关紧要的。
我如何将范围缩小到那些需要任何工程学位的工作,而不是特定的学位,而不需要亲自查看每个工作?
Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience.
I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions.
In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved.
Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one.
I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?
Edit: There's a similar, but harder problem, which would probably be more useful to solve. There are lots of jobs that just require an "engineering degree", as you just have to understand a few technical things. But searching for "engineering" gives you thousands of jobs, mostly irrelevant.
How do I narrow this down to just those jobs that require any engineering degree, rather than particular degrees, without looking at each myself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,这个答案可能不会有帮助——我会先说这一点。但是,在我看来,仅仅以这种方式思考问题就足以让你在我工作过的大多数地方被录用。我的建议?联系您感兴趣的任何职位的招聘经理,告诉他们这就是您正在做的事情。告诉他们到目前为止您已经编码了哪些内容,并寻求帮助来学习他们在编写广告时使用的模式。
如果我是这封信的收件人,我想我会邀请这个人接受采访。
Ok, this answer is probably not going to be helpful -- I will say that up front. But, in my opinion, merely thinking about the problem in this way is enough to get you hired at most places I've worked. My suggestion? Contact the hiring manager at any of the postings in which you have interest, tell them this is what you are doing. Tell them generically what you have coded so far, and ask for assistance in learning the patterns they use when writing their adverts.
If I were on the receiving end of this letter, I think I would invite the person in for an interview.
当我为自己和几个朋友寻找工作时,我为几个求职网站开发了一个很好的解析和电子邮件例程。我同意其他帖子,这是看待问题的好方法。只是为了提供一些信息,我主要是在 ruby 中完成的,并使用 Tor 代理和其他一些方法来确保我不会被排除在工作站点之外。这种项目与通常的抓取不同,因为你真的无法承受被工作委员会开除的后果。无论如何,我只有一个建议:忘记过度排序和微调。让人力资源部门为您做这件事,并让您的简历和证书到处可见。这是一个统计游戏,你想要广播自己并尽可能广泛地撒网。
I developed a good parse and email routine for a couple of job websites when I was looking for work for myself and a couple of friends. I agree with the other posts, this is a great way to look at the problem. Just to drop a little info, I did it mostly in ruby, and used tor proxies and some other methods to make sure that I wouldn't be iced out of the job site. This sort of project is unlike usual scraping as you really can't afford to get kicked off a job board. In any case, I just have one piece of advice: forget about sorting and fine tuning this too intensely. Let the HR department do that for you and get your resume and credentials out everywhere. It's a statistical game, and you want to broadcast yourself and throw that net as widely as possible.
如果您有兴趣,这里有一些示例代码。这是为了找公寓,而不是找工作,但概念应该足够相似。 http://github.com/agrimm/Easy-Roommate-parser
Here's some sample code if you're interested. It's for looking for a flat, not a job, but the concepts should be similar enough. http://github.com/agrimm/Easy-Roommate-parser