如何从 MySQL 中的文本字符串中选择不连续的字符？

发布于 2024-11-17 04:43:40 字数 1081 浏览 3 评论 0原文

我有一个包含数百万行的表和一列文本，长度正好为 11,159 个字符。它看起来像这样：

1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)

我意识到我可以使用

SELECT SUBSTR(column,2,4) FROM table;

...如果我想提取字符 2、3、4 和 5：

1202012101...
1202020120...
0121210212...
 ^^^^

但我需要提取不连续的字符，例如字符 1、5、7：

1202012101...
1202020120...
0121210212...
^   ^ ^

我意识到这可以。

SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;

但是这个查询对于我需要选择的数千个字符来说非常难以构建因此，对于问题的第一部分 - 如何构建一个执行类似操作的查询：

SELECT CHARACTERS(string,1,5,7) FROM table;

此外，我想要选择的字符的索引来自另一个看起来像这样的表：

char_index   keep_or_discard
1            keep
2            discard
3            discard
4            discard
5            keep
7            discard
8            keep
9            discard
10           discard

所以对于第二部分问题，如何构建一个查询来根据第二个表中该字符的索引是否 keep_or_discard="keep" 从第一个表中选择特定字符？

原文

I have a table with millions of rows and a single column of text that is exactly 11,159 characters long. It looks like this:

1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)

I realize that I can use

SELECT SUBSTR(column,2,4) FROM table;

...if I wanted to pull out characters 2, 3, 4, and 5:

1202012101...
1202020120...
0121210212...
 ^^^^

But I need to extract noncontiguous characters, e.g. characters 1,5,7:

1202012101...
1202020120...
0121210212...
^   ^ ^

I realize this can be done with a query like:

SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;

But this query gets very unwieldy to build for thousands of characters that I need to select. So for the first part of the question - how do I build a query that does something like this:

SELECT CHARACTERS(string,1,5,7) FROM table;

Furthermore, the indices of the characters I want to select are from a different table that looks something like this:

char_index   keep_or_discard
1            keep
2            discard
3            discard
4            discard
5            keep
7            discard
8            keep
9            discard
10           discard

So for the second part of the question, how could I build a query to select specific characters from the first table based on whether keep_or_discard="keep" for that character's index in the second table?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏九 2024-11-24 04:43:40

此函数执行您想要的操作：

CREATE DEFINER = `root`@`localhost` FUNCTION `test`.`getsubset`(selection mediumtext, longstring mediumtext)
RETURNS varchar(200)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'This function returns a subset of characters.'
BEGIN
  SET @res:='';
  SET @selection:=selection;
  WHILE @selection<>'' DO
    set @pos:=CONVERT(@selection, signed);
    set @res := concat_ws('',@res,SUBSTRING(longstring,@pos,1));
    IF LOCATE(',',@selection)=0 THEN 
       SET @selection:='';
    END IF;
    set @selection:=SUBSTRING(@selection,LOCATE(',',@selection)+1);
  END WHILE;
  RETURN @res;
END

注意：CONVERT('1,2,3,4',signed) 将产生 1，但会发出警告。

我已将其定义为可在数据库测试中使用。

该函数有两个参数；一个带有位置列表的字符串（！），以及一个从您想要提取字符的位置开始的长字符串。

使用此示例：

mysql> select * from keepdiscard;
+---------+------------+
| charind | keepordisc |
+---------+------------+
|       1 | keep       |
|       2 | discard    |
|       3 | keep       |
|       4 | discard    |
|       5 | keep       |
|       6 | keep       |
+---------+------------+
6 rows in set (0.00 sec)

mysql> select * from test;
+-------------------+
| longstring        |
+-------------------+
| abcdefghijklmnopq |
| 123456789         |
+-------------------+
2 rows in set (0.00 sec)

mysql> select getsubset(group_concat(charind ORDER BY charind),longstring) as result from keepdiscard, test  where keepordisc='keep' group by longstring;
+--------+
| result |
+--------+
| 1356   |
| acef   |
+--------+
2 rows in set, 6 warnings (0.00 sec)

警告源于函数中完成的向整数的快速转换。（见上面的评论）

this function does what you want:

CREATE DEFINER = `root`@`localhost` FUNCTION `test`.`getsubset`(selection mediumtext, longstring mediumtext)
RETURNS varchar(200)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'This function returns a subset of characters.'
BEGIN
  SET @res:='';
  SET @selection:=selection;
  WHILE @selection<>'' DO
    set @pos:=CONVERT(@selection, signed);
    set @res := concat_ws('',@res,SUBSTRING(longstring,@pos,1));
    IF LOCATE(',',@selection)=0 THEN 
       SET @selection:='';
    END IF;
    set @selection:=SUBSTRING(@selection,LOCATE(',',@selection)+1);
  END WHILE;
  RETURN @res;
END

Note: the CONVERT('1,2,3,4',signed) will yield 1, but it will give a warning.

I have it defined to be available in the database test.

The function takes two parameters; a string(!) with a list of positions, and a long string from where you want the characters taken.

An example of using this:

mysql> select * from keepdiscard;
+---------+------------+
| charind | keepordisc |
+---------+------------+
|       1 | keep       |
|       2 | discard    |
|       3 | keep       |
|       4 | discard    |
|       5 | keep       |
|       6 | keep       |
+---------+------------+
6 rows in set (0.00 sec)

mysql> select * from test;
+-------------------+
| longstring        |
+-------------------+
| abcdefghijklmnopq |
| 123456789         |
+-------------------+
2 rows in set (0.00 sec)

mysql> select getsubset(group_concat(charind ORDER BY charind),longstring) as result from keepdiscard, test  where keepordisc='keep' group by longstring;
+--------+
| result |
+--------+
| 1356   |
| acef   |
+--------+
2 rows in set, 6 warnings (0.00 sec)

The warnings stem from the fast conversion to integer that is done in the function. (See comment above)

回复收藏 0 原文

画▽骨i 2024-11-24 04:43:40

动态sql怎么样？（您将需要构建查询的选择部分）

CREATE PROCEDURE example_procedure()
BEGIN
  --
  --build the concat values here    
  --       
  SET @ids := '';
  SET @S = 'SELECT @ids := built_concat_of_values FROM table';
  PREPARE n_StrSQL FROM @S;
  EXECUTE n_StrSQL;
  DEALLOCATE PREPARE n_StrSQL;    
END

How about dynamic sql? (You will need to build the select part of the query)

CREATE PROCEDURE example_procedure()
BEGIN
  --
  --build the concat values here    
  --       
  SET @ids := '';
  SET @S = 'SELECT @ids := built_concat_of_values FROM table';
  PREPARE n_StrSQL FROM @S;
  EXECUTE n_StrSQL;
  DEALLOCATE PREPARE n_StrSQL;    
END

回复收藏 0 原文

巡山小妖精 2024-11-24 04:43:40

您可以编写一个 php 脚本来为您执行此操作：

<?php
    //mysql connect
    $conn = mysql_connect('localhost', 'mysql_user', 'mysql_password');

    if (!$conn) {
        echo 'Unable to connect to DB: ' . mysql_error();
        exit;
    }

    //database connect
    $db = mysql_select_db('mydb');

    if (!$db) {
        echo 'Unable to select mydb: ' . mysql_error();
        exit;
    }

    //get the keep numbers you’re going to use.
    //and change the number into string so, for example, instead of 5 you get 'SUBSTR(colm,5,1)'
    $result = mysql_query("SELECT number FROM number_table WHERE keep_or_discard='keep'");
    $numbers = array();
    while ($row = mysql_fetch_assoc($result)) {
        $row = 'SUBSTR(colm,' . $row . ',1)';
        $numbers = $row;
    }

    //implode the array so you get one long string with all the substrings
    //eg. 'SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,12,1)'
    $numbers = implode(",", $numbers);

    //pull the numbers you need and save them to an array.
    $result = mysql_query("SELECT " . $numbers . " FROM table");
    $concat = array();
    while ($row = mysql_fetch_assoc($result)) {
        $concat= $row;
    }

您将得到一个包含正确数字的数组。

如果您不能/不想使用 PHP 来实现此目的，我很抱歉，我只是不知道如何在没有 PHP、Perl、Python 或其他类似语言的情况下做到这一点。希望这个解决方案能有所帮助......

You can write a php script to do this for you:

<?php
    //mysql connect
    $conn = mysql_connect('localhost', 'mysql_user', 'mysql_password');

    if (!$conn) {
        echo 'Unable to connect to DB: ' . mysql_error();
        exit;
    }

    //database connect
    $db = mysql_select_db('mydb');

    if (!$db) {
        echo 'Unable to select mydb: ' . mysql_error();
        exit;
    }

    //get the keep numbers you’re going to use.
    //and change the number into string so, for example, instead of 5 you get 'SUBSTR(colm,5,1)'
    $result = mysql_query("SELECT number FROM number_table WHERE keep_or_discard='keep'");
    $numbers = array();
    while ($row = mysql_fetch_assoc($result)) {
        $row = 'SUBSTR(colm,' . $row . ',1)';
        $numbers = $row;
    }

    //implode the array so you get one long string with all the substrings
    //eg. 'SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,12,1)'
    $numbers = implode(",", $numbers);

    //pull the numbers you need and save them to an array.
    $result = mysql_query("SELECT " . $numbers . " FROM table");
    $concat = array();
    while ($row = mysql_fetch_assoc($result)) {
        $concat= $row;
    }

And there you have an array with the correct numbers.

I'm sorry if you can't/don't want to use PHP for this, I just don't really know how to do this without PHP, Perl, Python or some other similar language. Hopefully this solution will help somehow...

回复收藏 0 原文

无所的.畏惧 2024-11-24 04:43:40

困难的根源在于您的模式并不代表数据元素之间的真实关系。如果你想用“纯”SQL 来实现这一点，你需要一个更像这样的模式：

table
ID    Index    Char
1     0        1
1     1        2
1     2        0

charsToKeep
ID    Index    Keep
1     0        false
1     1        true
1     2        true

然后，你可以执行如下查询：

SELECT Char FROM table t JOIN charsToKeep c ON t.ID = c.ID WHERE c.Keep = true

但是，你可能有充分的理由按照你所拥有的方式构建数据（我的模式需要更多每个字符的存储空间和处理时间也可能比我要建议的要长得多）。

由于 SQL 没有工具来理解嵌入到表中的架构，因此您需要使用用户定义的函数来添加它们。 Kevin 的动态 SQL 示例也可能有效，但根据我的经验，这不如用户定义的函数快。

我在 MS SQL 中做过很多次，但从未在 MySql 中做过。您基本上需要一个用 C 或 C++ 编写的函数，它采用逗号分隔的要提取的索引列表以及要从中提取索引的字符串。然后，该函数将返回这些提取值的逗号分隔列表。请参阅以下链接以获得良好的起点：

http://dev。 mysql.com/doc/refman/5.1/en/adding-functions.html

http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html

要构建要从 char_index 表中提取的索引的串联列表，请尝试group_concat 函数：

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

希望这有帮助！

The source of your difficulty is that your schema does not represent the true relationships between the data elements. If you wanted to achieve this with "pure" SQL, you would need a schema more like:

table
ID    Index    Char
1     0        1
1     1        2
1     2        0

charsToKeep
ID    Index    Keep
1     0        false
1     1        true
1     2        true

Then, you could perform a query like:

SELECT Char FROM table t JOIN charsToKeep c ON t.ID = c.ID WHERE c.Keep = true

However, you probably have good reasons for structuring your data the way you have (my schema requires much more storage space per character and the processing time is also probably much longer from what I am about to suggest).

Since SQL does not have the tools to understand the schema you have embedded into your table, you will need to add them with a user-defined function. Kevin's example of dynamic SQL may also work, but in my experience this is not as fast as a user-defined function.

I have done this in MS SQL many times, but never in MySql. You basically need a function, written in C or C++, that takes a comma-delimited list of the indexes you want to extract, and the string from which you want to extract them from. Then, the function will return a comma-delimited list of those extracted values. See these links for a good starting point:

http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html

http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html

To build the concatenated list of indexes you want to extract from the char_index table, try the group_concat function:

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

Hope this helps!

回复收藏 0 原文

~没有更多了~