自动从文档中删除联系信息
有谁知道可以从 php 中使用的一个好的解决方案,该解决方案可以有效地从文档中删除电话号码、电子邮件地址甚至联系地址等联系信息?
更新
嘿伙计们,这是我到目前为止的想法,它运行得很好。
function sanitizeContent($content)
{
// emails - even containing white space characters like this 't e s t @ ba d . co m'
$content = preg_replace('/([A-Za-x-0-9\s\_\.]{1,50})(?=@)@([A-Za-x-0-9\s\_\.]{1,50})/', '[email removed]', $content);
// urls
$content = preg_replace('/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i', '[link removed]', $content);
// phone numbers
$content = preg_replace('/(\d)?(\s|-|.|\/)?(\()?(\d){3}(\))?(\s|-|.|\/){1}(\d){3}(\s|-|.|\/){1}(\d){4}/', '[phone removed]', $content);
$content = preg_replace('/[0-9\.\-\s\,\/(x|ext)]{5,50}/', '[phone removed]', $content);
// addresses????
return $content;
}
有没有人对地址有任何想法,我想也许想出一种方法来检测城市、州邮政编码,然后在此之前删除 x 字符。它可能会意外破坏一些数据,但这可能比披露更好。我真的很想听听其他人是否遇到过这个问题。
Does anybody know of a good solution that can be used from php that will effectively remove contact information like phone numbers, email addresses and maybe even contact addresses from a document?
Update
Hey Guys, here is what I came up with so far, it works pretty well.
function sanitizeContent($content)
{
// emails - even containing white space characters like this 't e s t @ ba d . co m'
$content = preg_replace('/([A-Za-x-0-9\s\_\.]{1,50})(?=@)@([A-Za-x-0-9\s\_\.]{1,50})/', '[email removed]', $content);
// urls
$content = preg_replace('/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i', '[link removed]', $content);
// phone numbers
$content = preg_replace('/(\d)?(\s|-|.|\/)?(\()?(\d){3}(\))?(\s|-|.|\/){1}(\d){3}(\s|-|.|\/){1}(\d){4}/', '[phone removed]', $content);
$content = preg_replace('/[0-9\.\-\s\,\/(x|ext)]{5,50}/', '[phone removed]', $content);
// addresses????
return $content;
}
Does anybody have any ideas for addresses, I am thinking maybe come up with a way to detect city, state zip then also strip x chars before that. It could clobber some data accidentally but that might be better than disclosure. I would be really interested to hear if anybody else has run into this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用正则表达式。
您可以使用 preg_replace 来完成此操作。
对于电子邮件:
对于网址:
Use regular expression.
You can use preg_replace to do it.
for emails:
for urls: