“过滤”的最佳方法用户输入用户名

发布于 2024-12-03 13:22:22 字数 696 浏览 1 评论 0原文

我有一个网站，允许用户创建“唯一 URL”，以便他们可以以 www.site.com/customurl 的形式传递给同事。

当然，我会进行检查以确保输入实际上是唯一的，但我也想过滤掉大公司名称（受版权保护的名称等）和咒语等内容。为此，我的想法是构建一个 txt 文件，其中包含想到的每个可能的名称/单词的列表。我们所拥有的测试 txt 文件的文件大小不是问题，但我很好奇这是否是解决此问题的最佳方法。我认为数据库调用不如读取文本文件那么有效。

我的代码是：

$filename = 'badurls.txt';
$fp = fopen($_SERVER['DOCUMENT_ROOT'] . '/' .$filename, 'r'); 
if ($fp) { 
  $array = explode("\n", fread($fp, filesize($_SERVER['DOCUMENT_ROOT'] . '/' .$filename))); 
}

if(in_array($url, $array)) {
  echo 'You used a bad word!';
} else {
  echo 'URL would be good'; 
}

注意，

我谈论的可能是前 100-200 家公司的列表，也可能是 100 个咒语。我可能是错的，但我预计这个列表永远不会超过 500 个单词，更不用说 1000 个了。

原文

I have a site which allows users to create a 'unique URL' so they can pass along to colleagues in the form of www.site.com/customurl.

I, of course, run a check to make sure the input is actually unique but I also want to filter out things like large company names (copyrighted names, etc) and curse words. To do this, my thought was to build a txt file with a list of every possible name/word which came to mind. The file size on the test txt file we have is not a concern but am curious if this is the best way to go about this. I do not think a DB call is as efficient as reading in the text file.

My code is:

$filename = 'badurls.txt';
$fp = fopen($_SERVER['DOCUMENT_ROOT'] . '/' .$filename, 'r'); 
if ($fp) { 
  $array = explode("\n", fread($fp, filesize($_SERVER['DOCUMENT_ROOT'] . '/' .$filename))); 
}

if(in_array($url, $array)) {
  echo 'You used a bad word!';
} else {
  echo 'URL would be good'; 
}

NOTE

I am talking about possibly a list of the top 100-200 companies and maybe 100 curse words. I could be wrong but do not anticipate this list ever growing beyond 500 words total, let alone 1000.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

澜川若宁 2024-12-10 13:22:22

您可能不认为数据库调用那么高效，但它的效率要高得多。数据库会生成数据索引，因此实际上不必遍历每个项目（如 in_array 内部所做的那样）来查看它是否存在。您的代码将是O(n)，而数据库将是O(log n)...更不用说不必在其数据库中加载文件所节省的内存了。每个页面加载时的完整性。（请参阅B 树索引）。

当然，500 个元素并不是很多。把它放在一个文件中也没什么大不了的，不是吗？事实上，会的。这并不是一个太大的性能问题（数据库调用的开销将抵消文件的效率损失，因此它们在时间上应该大致相等）。但这是可维护性的问题。你今天说最多500字。当您意识到需要提供重复检测时会发生什么？也就是说，检查您的站点中是否存在现有 URL。无论如何，这都需要数据库查询，那么为什么不在一个地方处理所有这些呢？

只需创建一个包含名称的表，为其建立索引，然后执行简单的 SELECT。会更快。而且效率更高。并且更具可扩展性……想象一下，如果您的数据达到 1GB。数据库可以很好地处理这个问题。读入内存的文件不能（你会耗尽 RAM）...

不要尝试这样优化，应避免过早优化。相反，实施干净且良好的解决方案，然后仅在应用程序完成后进行必要的优化（并且您可以识别缓慢的部分）...

另一点值得考虑。如果文件中存在 $url = 'FooBar'; 和 foobar，则代码将失败。当然，您可以简单地在 url 上执行 strtolower 操作，但何必呢？这是数据库的另一个优点。它可以进行不区分大小写的遍历。所以你可以这样做：

SELECT id FROM badnametable WHERE badname LIKE 'entry' LIMIT 1

并且只需检查是否没有匹配的行。无需执行 COUNT(*) 或其他任何操作。您所关心的只是匹配行的数量（0 是好的，!0 是不好的）。

You may not think that a DB call is as efficient, but it is much more efficient. The database generates indexes on the data, and so it doesn't actually have to iterate through each item (as in_array does internally) to see if it exists. Your code will be O(n) and the DB will be O(log n)... Not to mention the memory savings from not having to load the file in its entirety on each page load. (see B-Tree Indexes).

Sure, 500 elements isn't a whole lot. It wouldn't be a huge deal to just stick that in a file, would it? Actually, it would. It's not a much a performance issue (the overhead of the DB call will cancel out the efficiency loss of the file, so they should be roughly even in terms of time). But it is an issue of maintainability. You say today that 500 words is the maximum. What happens when you realize that you need to provide duplicate detection? That is, check for the existence of existing URLs in your site. That will require a DB query anyway, so why not just take care of it all in one place?

Just create a table with names, index it, and then do a simple SELECT. It will be faster. And more efficient. And more scalable... Imagine if you reach 1gb of data. A database can handle that fine. A file read into memory cannot (you'll run out of RAM)...

Don't try to optimize like this, Premature Optimization should be avoided. Instead, implement the clean and good solution, and then optimize only if necessary after the application is finished (and you can identify the slow parts)...

One other point worth considering. The code as is will fail if $url = 'FooBar'; and foobar is in the file. Sure, you could simply do strtolower on the url, but why bother? That's another advantage of the database. It can do case-insensitive traversal. So you can do:

SELECT id FROM badnametable WHERE badname LIKE 'entry' LIMIT 1

And just check that there are no matching rows. There's no need to do a COUNT(*), or anything else. All you care about is the number of matching rows (0 is good, !0 is not good).

回复收藏 0 原文

~没有更多了~