使用纯 JavaScript 将页面上的术语链接到维基百科文章

发布于 2024-08-23 22:41:50 字数 2910 浏览 6 评论 0原文

在浏览时我遇到了 这个博客发布关于使用 Wikipedia API(来自 JavaScript,将单个搜索词链接到其定义。在博客文章的末尾,作者提到了可能的扩展,包括:

一个自动将术语链接到维基百科文章的插件。

这完全符合我正在处理的项目要求,但遗憾的是我缺乏扩展 原始源代码。我想要的是有一个可以添加到网页的纯 JavaScript 片段,它将该网页上包含内部 wiki 上的文章的所有术语链接到该 wiki。

我知道这可能要求很高,但代码看起来已经差不多了,如果有人愿意为该虚拟信用完成剩余的工作,我愿意添加赏金..;)我也怀疑这可能是对其他一些人来说很有价值,因为我见过类似的请求,但没有有效的实现(这只是一个 JavaScript(因此可移植)库/片段)。

这是原始源代码的示例,我希望任何人都能够添加到此示例,或者指出如果我自己实现这个示例,我需要添加什么(在这种情况下,如果我设法做到的话,我将分享代码把一些东西放在一起)。

<script type="text/javascript"><!--
var spellcheck = function (data) {
    var found = false; var url=''; var text = data [0];
    if (text != document.getElementById ('spellcheckinput').value)
        return;
    for (i=0; i<data [1].length; i++) {
        if (text.toLowerCase () == data [1] [i].toLowerCase ()) {
            found = true;
            url ='http://en.wikipedia.org/wiki/' + text;
            document.getElementById ('spellcheckresult').innerHTML = '<b style="color:green">Correct</b> - <a target="_top" href="' + url + '">link</a>';
        }
    }
    if (! found)
        document.getElementById ('spellcheckresult').innerHTML = '<b style="color:red">Incorrect</b>';
};

var getjs = function (value) {
    if (! value)
        return;
    url = 'http://en.wikipedia.org/w/api.php?action=opensearch&search='+value+'&format=json&callback=spellcheck';
    document.getElementById ('spellcheckresult').innerHTML = 'Checking ...';
    var elem = document.createElement ('script');
    elem.setAttribute ('src', url);
    elem.setAttribute ('type','text/javascript');
    document.getElementsByTagName ('head') [0].appendChild (elem);
};--></script>
<form action="#" method="get" onsubmit="return false"> 
<p>Enter a word - <input id="spellcheckinput" onkeyup="getjs (this.value);" type="text"> <span id="spellcheckresult"></span></p></form>

更新
正如评论中所指出的,链接所有单词所需的时间以及如何处理跨单词的文章名称也是我所关心的。

我认为从单个单词的文章开始已经涵盖了很大一部分用例,当跳过英语中的 500 个最常见单词时,可能会获得一些性能优势,但我仍然不确定这种方法的可行性。

但从好的方面来说,这将全部是客户端,并且会出现一些延迟在链接方面是完全可以接受的。

或者,搜索鼠标悬停/选择的术语也可能是可以接受的,但我不确定这是否会降低或增加复杂性。


更新 2

下面解释了此功能可能是“Pointy”从 api.php?action=query&list=allpages 获取文章主题列表后,通过更改一些相当标准的突出显示脚本来实现。
重申一下:我们使用的是内部 wiki,因此文章列表可能是有限的、明确的且特定于领域的,足以克服匹配单词中的一些预期问题。

由于到目前为止我们已经提出了一些好的建议和一些可行的想法,因此我开始悬赏,看看是否能得到一些答案。

While browsing I came across this blog post about using the Wikipedia API from JavaScript, to link a single search term to it's definition. At the end of the blog post the author mentions possible extensions including:

A plugin which auto links terms to Wikipedia articles.

This fits the bill perfectly for a project requirement I'm working on, but sadly I lack the programming skills to extend the original source code. What I'd like is to have a pure JavaScript snippet I can add to a webpage, that links all the terms on that webpage that have an article on an internal wiki to that wiki.

I know this might be asking for much, but the code looks like it's nearly there, and I'd be willing to add a bounty if anyone will do the remaining work for that virtual credit.. ;) I also suspect this might be of value to a few others, as I've seen similar requests but no working implementation (that's a mere JavaScript (and therefore portable) library/snippet include).

Here's a sample of the original source code, I hope anyone is able to add to this or point me to what I'd need to add if I were to implement this myself (in which case I'll share the code if I manage to put something together).

<script type="text/javascript"><!--
var spellcheck = function (data) {
    var found = false; var url=''; var text = data [0];
    if (text != document.getElementById ('spellcheckinput').value)
        return;
    for (i=0; i<data [1].length; i++) {
        if (text.toLowerCase () == data [1] [i].toLowerCase ()) {
            found = true;
            url ='http://en.wikipedia.org/wiki/' + text;
            document.getElementById ('spellcheckresult').innerHTML = '<b style="color:green">Correct</b> - <a target="_top" href="' + url + '">link</a>';
        }
    }
    if (! found)
        document.getElementById ('spellcheckresult').innerHTML = '<b style="color:red">Incorrect</b>';
};

var getjs = function (value) {
    if (! value)
        return;
    url = 'http://en.wikipedia.org/w/api.php?action=opensearch&search='+value+'&format=json&callback=spellcheck';
    document.getElementById ('spellcheckresult').innerHTML = 'Checking ...';
    var elem = document.createElement ('script');
    elem.setAttribute ('src', url);
    elem.setAttribute ('type','text/javascript');
    document.getElementsByTagName ('head') [0].appendChild (elem);
};--></script>
<form action="#" method="get" onsubmit="return false"> 
<p>Enter a word - <input id="spellcheckinput" onkeyup="getjs (this.value);" type="text"> <span id="spellcheckresult"></span></p></form>

Update
As pointed out in the comments, both the time it would take to link all words and how to handle multiple word spanning article names were concerns of mine as well..

I'd think starting with single word articles would already cover a large percentage of the use cases, with maybe some performance benefits gained when skipping the 500 most common words in the English language, but still I'm uncertain how feasible this approach will be..

On the upside however this would all be client side, and some delay in linking terms is fully acceptable.

Alternatively searching for terms the mouse is hovering over / selected might be acceptable as well, but I'm unsure if this would decrease or increase complexity..


Update 2

'Pointy' explained below that this functionality could be achieved by altering some fairly standard highlighting scripts, after having obtained a list of article topics from api.php?action=query&list=allpages.
To reinterate: we're using an internal wiki, so the list of articles is likely limited, non ambiguous and domain specific enough to overcome some of the expected problems in matching words.

Since we've had some good suggestions so far, and a few workable ideas, I'm starting a bounty to see if I can get a few answers on this..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鹊巢 2024-08-30 22:41:51

也许这样的东西可能会有所帮助:

假设非常简单的 HTML/文本,如下所示:

<div id="theText">Testing the auto link system here...</div>

和两个非常小的脚本。

dictionary.js 设置您的术语列表。我的想法是,如果需要的话,可以通过查询文章数据库在 php 中生成。它还可以跨域加载(因为它设置了window.termsRE)。如果您不需要从数据库生成列表,您也可以使用 termlinker.js 手动添加。

生成 RegExp 的代码假定您的 terms 数组包含格式正确的字符串以使用正则表达式进行匹配,因此请务必使用 \\ 转义 []\ .?*+|(){}^&

// dictionary.js - define some terms
var terms = ['testing', 'auto link'];
window.termsRE = new RegExp("\\b("+terms.join("|")+")\\b",'gi');

termlinker.js 只是定义术语的简单正则表达式搜索替换。它也可以是内联

// termlinker.js - add some tags
var element = document.getElementById("theText");

element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
  return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
}); 

这只是搜索术语数组中的任何单词并将其替换为该术语的链接。当然,它还会匹配 HTML 标记内的属性和值,这可能会稍微破坏您的标记。

将所有内容放在一起,您将得到此(jsbin 预览版)


API

使用基于之前的“最小情况”的 ,这里是使用 API 直接接收单词列表和jsbin 预览的代码示例

// Utility Function
RegExp.escape = function(text) {
  if (!arguments.callee.sRE) {
    var specials = [
      '/', '.', '*', '+', '?', '|',
      '(', ')', '[', ']', '{', '}', '\\'
    ];
    arguments.callee.sRE = new RegExp(
      '(\\' + specials.join('|\\') + ')', 'g'
    );
  }
  return text.replace(arguments.callee.sRE, '\\$1');
};

// JSONP Callback for receiving the API
function receiveAPI(data) {
  var terms = [];
  if (!data || !data['query'] || !data['query']['allpages']) return false;  
  var pages = data.query.allpages
  for (var x in pages) {
    terms.push(RegExp.escape(pages[x].title));
  }
  window.termsRE = new RegExp("\\b("+terms.reverse().join("|")+")\\b",'gi');
  linkterms();
}  

function linkterms() {
  var element = document.getElementById("theText");

  element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
    return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
  });
}


// the apfrom=testing can be removed, it is only there so that
// we can get some useful terms near "testing" to work with.
// we are limited to 500 terms for the purpose of this demo:
url = 'http://en.wikipedia.org/w/api.php?action=query&list=allpages&aplimit=500&format=json&callback=receiveAPI' + '&apfrom=testing';
var elem = document.createElement('script');
elem.setAttribute('src', url);
elem.setAttribute('type','text/javascript');
document.getElementsByTagName('head')[0].appendChild (elem);

Perhaps something like this might help:

Assuming very simple HTML/Text like so:

<div id="theText">Testing the auto link system here...</div>

And two very small scripts.

dictionary.js sets up your list of your terms. My thought was that this could be generated in php by querying the articles database if you wanted. It also can be loaded cross domain (as it sets window.termsRE). If you don't need to generate the list from the database, you could also manually put it with termlinker.js.

This code that generates the RegExp assumes that your terms array contains properly formatted strings to match using Regular Expressions, so be sure to use \\ to escape []\.?*+|(){}^&

// dictionary.js - define some terms
var terms = ['testing', 'auto link'];
window.termsRE = new RegExp("\\b("+terms.join("|")+")\\b",'gi');

termlinker.js is just a simple regexp search replace on the defined terms. It could be an inline <script> too. requires that the dictionary.js has been loaded before you run it.

// termlinker.js - add some tags
var element = document.getElementById("theText");

element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
  return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
}); 

This simply searches for any words in the terms array and replaces them with a link to the term. Of course, it will also match properties and values inside HTML tags, which could break your markup a little.

All thrown together you get this (jsbin preview)


Using the API

Based off of the "minimum case" from before, here is the code sample for using the API to receive the list of words directly and the jsbin preview

// Utility Function
RegExp.escape = function(text) {
  if (!arguments.callee.sRE) {
    var specials = [
      '/', '.', '*', '+', '?', '|',
      '(', ')', '[', ']', '{', '}', '\\'
    ];
    arguments.callee.sRE = new RegExp(
      '(\\' + specials.join('|\\') + ')', 'g'
    );
  }
  return text.replace(arguments.callee.sRE, '\\$1');
};

// JSONP Callback for receiving the API
function receiveAPI(data) {
  var terms = [];
  if (!data || !data['query'] || !data['query']['allpages']) return false;  
  var pages = data.query.allpages
  for (var x in pages) {
    terms.push(RegExp.escape(pages[x].title));
  }
  window.termsRE = new RegExp("\\b("+terms.reverse().join("|")+")\\b",'gi');
  linkterms();
}  

function linkterms() {
  var element = document.getElementById("theText");

  element.innerHTML = element.innerHTML.replace(termsRE, function(term) {
    return "<a href='http://en.wikipedia.org/wiki/"+escape(term)+"'>"+term+"</a>";
  });
}


// the apfrom=testing can be removed, it is only there so that
// we can get some useful terms near "testing" to work with.
// we are limited to 500 terms for the purpose of this demo:
url = 'http://en.wikipedia.org/w/api.php?action=query&list=allpages&aplimit=500&format=json&callback=receiveAPI' + '&apfrom=testing';
var elem = document.createElement('script');
elem.setAttribute('src', url);
elem.setAttribute('type','text/javascript');
document.getElementsByTagName('head')[0].appendChild (elem);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文