数据库与平面文件,这是“正则表达式”的更快结构。与许多同时请求匹配
哪种结构返回更快的结果和/或减少主机服务器、平面文件或数据库(mysql)的负担?
假设许多用户(100 个用户)同时查询文件/db。 搜索涉及针对静态文件/数据库的模式匹配。 文件有 50,000 个唯一行(相同数据类型)。 可能会有很多场比赛。 没有写入文件/db,只是读取。
如果主文件正在使用中,是否可以复制文件/数据库并编写逻辑开关以使用备份文件/数据库?
哪种语言最适合这种结构类型? Perl 用于平面,PHP 用于数据库?
附加信息:
如果我想找到所有名称中都有“cis”模式的城市。 使用正则表达式或字符串函数哪个更好/更快?
请推荐一个
TIA策略
which structure returns faster result and/or less taxing on the host server, flat file or database (mysql)?
Assume many users (100 users) are simultaneously query the file/db.
Searches involve pattern matching against a static file/db.
File has 50,000 unique lines (same data type).
There could be many matches.
There is no writing to the file/db, just read.
Is it possible to have a duplicate the file/db and write a logic switch to use the backup file/db if the main file is in use?
Which language is best for the type of structure? Perl for flat and PHP for db?
Addition info:
If I want to find all the cities have the pattern "cis" in their names.
Which is better/faster, using regex or string functions?
Please recommend a strategy
TIA
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我非常喜欢简单的解决方案,因此对于简单的任务,我更喜欢平面文件存储。具有索引功能的关系数据库根本无法帮助您处理任意正则表达式模式,并且文件系统的缓存可确保这个相当小的文件无论如何都在内存中。我会走平面文件 + perl 路线。
编辑:(考虑到您的新信息)
如果它实际上只是在一个已知属性中查找子字符串,那么使用全文索引(数据库提供的)会对您有所帮助(取决于应用的索引类型),并且可能提供适合您要求的简单且相当快速的解决方案。当然,您可以自己在文件系统上实现索引,例如使用 后缀树,速度方面很难被击败。尽管如此,我还是会采用平面文件路线(如果它符合您的目的,请看看
awk
),因为如果您已经开始实现它,您就已经完成了;)此外我怀疑你所说的用户数量不会让系统感觉到差异(无论如何你的CPU大多数时候都会感到无聊)。如果你不确定,就试试吧!实现 regex+perl 解决方案,如果您了解 perl,则需要几分钟,循环 100 次并用
时间
进行测量。如果它足够快,请使用它,如果不够快,请考虑其他解决方案。您必须记住,就现代计算而言,50,000 条独特的行确实是一个很小的数字。 (与此比较:优化子字符串查询的 Mysql 表索引)HTH ,
亚历山大
I am a huge fan of simple solutions, and thus prefer -- for simple tasks -- flat file storage. A relational DB with its indexing capabilities won't help you much with arbitrary regex patterns at all, and the filesystem's caching ensures that this rather small file is in memory anyway. I would go the flat file + perl route.
Edit: (taking your new information into account)
If it's really just about finding a substring in one known attribute, then using a fulltext index (which a DB provides) will help you a bit (depending on the type of index applied) and might provide an easy and reasonably fast solution that fits your requirements. Of course, you could implement an index yourself on the file system, e.g. using a variation of a Suffix Tree, which is hard to be beaten speed-wise.Still, I would go the flat file route (and if it fits your purpose, have a look at
awk
), because if you had started implementing it, you'd be finished already ;) Further I suspect that the amount of users you talk about won't make the system feel the difference (your CPU will be bored most of the time anyway).If you are uncertain, just try it! Implement that regex+perl solution, it takes a few minutes if you know perl, loop 100 times and measure with
time
. If it is sufficiently fast, use it, if not, consider another solution. You have to keep in mind that your 50,000 unique lines are really a low number in terms of modern computing. (compare with this: Optimizing Mysql Table Indexing for Substring Queries )HTH,
alexander
取决于您的查询和数据看起来像全文搜索引擎,例如 Lucene 或 Sphinx 可能是个好主意。
Depending on how your queries and your data look like a full text search engine like Lucene or Sphinx could be a good idea.