从 MySQL 中检索文本数据的 SQL 语句,其中字段中的每个单词都重复

发布于 2024-12-21 23:19:40 字数 1038 浏览 0 评论 0原文

我希望 SQL 从一个简单的单个 MySQL 表中检索文本数据,其中字段中的每个(空格分隔的)单词都会在结果中加倍/立即重复。

示例表

+--------------------+
| text_data          |
+--------------------+
| The man walks.     |
| Apple pie.         |
| This is a sentence |
+--------------------+

所需的输出。

+---------------------------------------+
| output                                |
+---------------------------------------+
| The The man man walks. walks.         |
| Apple Apple pie. pie.                 |
| This This is is a a sentence sentence |
+---------------------------------------+

单词的顺序很重要。

编辑1:由于应用程序层(上下文是Sphinx中的索引生成)和源数据不断变化的性质,这确实必须在SQL中动态完成(没有临时表,没有php)。

编辑2(更详细的上下文):我希望Sphinx中的 wordforms 功能来处理某些词对非破坏性地。也就是说,如果我将“playstation 3”变成“playstation playstation 3 3”,我仍然可以应用单词形式“playstation 3 > ps3”,并且实际上该文章将可以通过查询“playstation”和“ps3”进行搜索”。如果我不加倍,但仍然应用单词形式,那么我就无法通过搜索“playstation”来检索文章。

I would like the SQL to retrieve text data from a simple single MySQL table, where every (space delimited) word in a field is doubled / immediately repeated in the result.

Sample table

+--------------------+
| text_data          |
+--------------------+
| The man walks.     |
| Apple pie.         |
| This is a sentence |
+--------------------+

Desired output.

+---------------------------------------+
| output                                |
+---------------------------------------+
| The The man man walks. walks.         |
| Apple Apple pie. pie.                 |
| This This is is a a sentence sentence |
+---------------------------------------+

The order of the words is important.

Edit 1: This does have to be done in dynamically in SQL (no temp tables, no php) due to the application layer (the context is the index generation in Sphinx) and the ever changing nature of the source data.

Edit 2 (more detailed context): I want the wordforms functionality in Sphinx to handle certain pairs of words non-destructively. That is, if I turn "playstation 3" into "playstation playstation 3 3", I can still apply the word form "playstation 3 > ps3", and effectively the article would be searchable by both the queries "playstation" and "ps3". If I don't double up, but still apply the wordform, then I lose the ability to retrieve the article by searching for "playstation".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

日记撕了你也走了 2024-12-28 23:19:40

一些选项...

  1. 您可以使用 XMLpipe/2 索引方法来为 sphinx 的数据建立索引。然后您可以在索引期间使用任意(PHP 等)代码。这样你就可以使用 php 来解决你的特定问题。
    http://sphinxsearch.com/docs/current.html#xmlpipe2

  2. 如果你有一个小的wordforms文件,你可以将特定的情况硬编码到mysql中

    sql_query = SELECT id,REPLACE(text_data, 'playstation 3', 'playstation playstation 3 3') AS text_data,....
    

    (也许使用嵌套的 REPLACE(...) 调用)

    (如果大小写是一个问题,也许可以使用 PREG_REPLACE UDF)

将所有单词加倍,无论如何似乎都是一个坏主意,因为它会破坏 sphinx 排名和搜索三个以上短语术语的能力(没有一些技巧)。因此使用 xmlpipe,这样 PHP 就可以直接读取单词形式,并且只将所需的术语加倍。

A few options...

  1. You could use XMLpipe/2 indexing method to index the data for sphinx. Then you can use arbitary (PHP etc) code during indexing. That way you could use php to solve your particular problem.
    http://sphinxsearch.com/docs/current.html#xmlpipe2

  2. If you have a small wordforms file, you could just hard-code the specific cases into mysql

    sql_query = SELECT id,REPLACE(text_data, 'playstation 3', 'playstation playstation 3 3') AS text_data,....
    

    (perhaps using nested REPLACE(...) calls)

    (if case is an issue, maybe the PREG_REPLACE UDF could be used)

Doubling all words, seems like a bad idea anyway, as it will break sphinx ranking and the ability to search three+ phrase terms (without some hacks). So using xmlpipe, so the PHP can read the wordforms directly, and only douple up the required terms.

薄暮涼年 2024-12-28 23:19:40

我不确定我是否会选择 sql 作为我的第一种语言来执行此操作,但如果我必须解决这个问题,我会的。

  1. 创建一个 split 函数,如果我手边还没有一个表,它会返回一个表。
  2. 将表中的每条记录更新为单词加本身。
  3. 从表中选择所有记录到串联结果中。

如果这些步骤中有任何一个不够明确,请告诉我。

I'm not sure I would choose sql as my first language to do this, but if I had to tackle this I would.

  1. create a split function that returns a table if I didn't already have one handy.
  2. update every record in the table to the word plus itself.
  3. select all records from the table into a concatenated result.

Let me know if any of these steps isn't explicit enough.

青萝楚歌 2024-12-28 23:19:40

您可以编写自己的函数来执行此操作。这是第一种方法。请注意,这只是一个模板,您应该编码一点:

DROP FUNCTION IF EXISTS REPEAT_WORD
DELIMITER //
CREATE FUNCTION REPEAT_WORD(
   X TEXT
)
RETURNS TEXT
BEGIN
   DECLARE Y TEXT, PAROLE VARCHAR(255);
   WHILE INSTR( X, ' ') > 0 DO
      SET PAROLE = get first word from X
      SET Y = Y + ' ' + PAROLE + ' ' + PAROLE 
      SET X = remove PAROLE from X ...
   END WHILE;
   RETURN Y;
END
//

然后在 select 中使用该函数:

select REPEAT_WORD( text_data) as `output` 
from table

You can write your own function that do this. Here a first approach. Notice that this is only a template, you should coding for a bit:

DROP FUNCTION IF EXISTS REPEAT_WORD
DELIMITER //
CREATE FUNCTION REPEAT_WORD(
   X TEXT
)
RETURNS TEXT
BEGIN
   DECLARE Y TEXT, PAROLE VARCHAR(255);
   WHILE INSTR( X, ' ') > 0 DO
      SET PAROLE = get first word from X
      SET Y = Y + ' ' + PAROLE + ' ' + PAROLE 
      SET X = remove PAROLE from X ...
   END WHILE;
   RETURN Y;
END
//

then use the function in select:

select REPEAT_WORD( text_data) as `output` 
from table
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文