用于索引文档集合的 Web 前端

发布于 2024-12-19 07:26:08 字数 597 浏览 5 评论 0原文

我有一个 XML 文档集合、一个倒排文件索引器和一个用于搜索索引器生成的索引(或多个索引)的命令行工具。请注意,后者返回文档 ID 列表和有关每个文档的各种统计信息(根据各种功能、术语命中率等进行排名),而不是实际的文档文本。这两个程序都是用 C 语言(由我)编写的。

  • 该集合并不大(~1GB)。
  • 索引大约是集合大小的 10-20%。
  • 这不打算(也永远不会)供公众使用(使用它需要登录)。
  • 它需要在完全禁用客户端脚本的情况下运行。

我想创建一个简单的网络前端,它允许我使用一个或多个搜索词查询索引并适当地呈现结果,但我已经有一段时间没有接触任何网络内容了。

我希望看到查询返回的或多或少相同的信息,但我不确定是否编写一些东西(例如 PHP、Ruby - 欢迎其他建议)来调用我的命令行查询程序并处理输出,或者重新实现查询程序是否更合适。

一个与另一个相比有什么明显的优势吗?安全风险? 谁能给我推荐一个适合这些的轻量级框架或库? (就像我说的,有一段时间没有接触网络内容了。)

我应该调用 CLI 查询程序吗?为什么或为什么不呢?

(=/我希望我没有太含糊……请告诉我是否应该以不同的方式问这个问题。)

I have an XML document collection, an inverted file indexer, and a command-line tool for searching the index (or indices) produced by the indexer. Note that the latter returns a list of document IDs and various statistics about each document (rankings according to various functions, term hits, etc) rather than the actual document text. Both programs were written in straight C (by me).

  • The collection is not huge (~1GB).
  • The index is about 10-20% of the collection size.
  • This is not intended (and never will be) for public use (using it will require logging in).
  • It needs to run with client-side scripting totally disabled.

I'd like to whip up a simple web frontend that would allow me to query the index with a search term or terms and present the results appropriately, but it's been a while since I touched any web stuff.

I want to see more or less the same info a query returns at the moment, but I'm not sure whether to write something (e.g. PHP, Ruby - alternative suggestions are welcome) that calls my command-line query program and processes the output, or whether re-implementing the query program would be more appropriate.

Are there any distinct advantages one has over the other? Security risks?
And can anyone recommend me a lightweight framework or library appropriate for any of this? (Like I said, haven't touched web stuff in a while.)

Should I call the CLI query program? Why or why not?

(=/ I hope I'm not being too vague... do tell me if I should be asking this in a different manner.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

屋顶上的小猫咪 2024-12-26 07:26:08

对于像这样简单的事情,我会使用 PHP 和 Apache 服务器。为什么?

它不需要 Web 框架来与 Apache 进行交互;更低的复杂性 = 更少的配置时间。您只需安装 Apache 和 php 模块,然后将此文件放入您的 Web 根目录中,并使用文本区域 将 html 表单指向 http://127.0.0.1/indexer.php >“name”“author”:(

<?php
$required_terms = array("name", "author");

foreach ($required_terms as $value) {
    if (!isset($_POST[$value])) {
        printf("The search term \"%s\" was missing", $value);
        exit;
    }
}

$terminal_command = sprintf("/usr/bin/indexer -n %s -a %s", $_POST["name"], $_POST["author"]);
print exec($terminal_command);

注意这只是为了显示简单性,它需要验证收到的帖子值)。

然后,这将使用这 2 个值作为参数运行您的应用程序,然后打印应用程序发送到 stdout 的任何内容。不再有麻烦或需要设置的事情。您需要几分钟的时间才能启动并运行。

因此,主要原因是设置简单且快速,对于像这样的内部且简单的东西。

For something simple like this, I would use PHP and an Apache server. Why?

It doesn't require a web framework to interface between Apache; less complexity = less time for you to spend configuring. You could just install Apache and the php module, then drop in this file in your web-root, and point a html form to http://127.0.0.1/indexer.php with the textareas "name" and "author":

<?php
$required_terms = array("name", "author");

foreach ($required_terms as $value) {
    if (!isset($_POST[$value])) {
        printf("The search term \"%s\" was missing", $value);
        exit;
    }
}

$terminal_command = sprintf("/usr/bin/indexer -n %s -a %s", $_POST["name"], $_POST["author"]);
print exec($terminal_command);

(Note this is just to show the simplicity, it needs validation of the post values received).

Then this would run your application with the 2 values as arguments, then print whatever was sent to stdout by your application. No more hassle or things to setup. It would take you a couple of minutes to get up and running.

So the main reason would be simple and fast to setup, for something internal and simple as this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文