是否有好的开源库用于从 URL 解析搜索词?
我正在寻找一个库,它可以解析日志文件(或传入请求)并在请求来自搜索引擎时提取搜索词。
有没有什么好的库提供这个功能?
任何语言都可以。
I'm looking for a library which parses log files (or incoming requests) and extracts out the search-terms if/when the request came from a search engine.
Are there any good libraries which provide this function?
Any language will do.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Java有Lucene框架,它是一个高性能的文本搜索引擎。日志文件可以处理此问题,但对于传入的请求可能会更棘手。您需要在流式传输时对其进行解析吗?
Java has the Lucene framework that is a high performance text search engine. Log files could work with this, but for incoming requests could be trickier. Do you need to parse it while it is streaming in?
有很多方法可以获取、解析和分析您所说的数据。
非常简单,您可以使用日志文件文本并将它们导入 SQL 数据库进行分析(允许您还查看其他请求等)。
您可以使用 Google Analytics 等软件服务。
或者我个人最喜欢的:
将
SQL INSERT
写入跟踪表。这样做时,您可以将字符串解析为子句——非常简单地用单词分隔。这样做的缺点是您会错过关键字短语,例如“New York”(两个单词)。建议使用 Lucene 的人提供了一些信息,可以让您梦想出一个非常简洁的分析器,但要获得完整的解决方案需要做很多工作。 Lucene 和 Solr 的巧妙之处在于,它们可以使用其标准库对关键字字符串进行标记(在具有CompoundWords 或CamelCaseKeywords 的情况下分块出两到三个单词子句)。
从实用的角度来看,我认为使用现成的工具(例如 Google Analytics)是最好的选择。如果您有时间和技能,向数据库中插入一条记录会随着添加内容而变得非常强大。
There are many ways to get, parse and analyze the data you speak of.
Very simply, you could use the log file texts and import them into a SQL database for analysis (allowing you to also look at other requests, etc.).
You could use a software service such as Google Analytics.
Or my personal favorite:
Write a
SQL INSERT
into a tracking table. In so doing, you can parse the string into clauses -- very simply separating by words. The downside to this is that you'll miss keyword phrases such as "New York" (being two words).The person suggesting Lucene offered up a morsel of info that could cause you to dream up a pretty neat analyzer, but it would take much work to get a complete solution. The neat thing about Lucene and Solr is that they can tokenize the keyword string using their standard libraries (chunking out two to three word clauses where you have CompoundWords or CamelCaseKeywords).
From a practical approach, I think you're best served by using something off the shelf, such as Google Analytics. If you have the time and skills, inserting a record into a database can turn into something very powerful as you add to it.