如何从 MySQL 中的文本字符串中选择不连续的字符?
我有一个包含数百万行的表和一列文本,长度正好为 11,159 个字符。它看起来像这样:
1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)
我意识到我可以使用
SELECT SUBSTR(column,2,4) FROM table;
...如果我想提取字符 2、3、4 和 5:
1202012101...
1202020120...
0121210212...
^^^^
但我需要提取不连续的字符,例如字符 1、5、7:
1202012101...
1202020120...
0121210212...
^ ^ ^
我意识到这可以 。
SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;
但是这个查询对于我需要选择的数千个字符来说非常难以构建 因此,对于问题的第一部分 - 如何构建一个执行类似操作的查询:
SELECT CHARACTERS(string,1,5,7) FROM table;
此外,我想要选择的字符的索引来自另一个看起来像这样的表:
char_index keep_or_discard
1 keep
2 discard
3 discard
4 discard
5 keep
7 discard
8 keep
9 discard
10 discard
所以对于第二部分问题,如何构建一个查询来根据第二个表中该字符的索引是否 keep_or_discard="keep" 从第一个表中选择特定字符?
I have a table with millions of rows and a single column of text that is exactly 11,159 characters long. It looks like this:
1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)
I realize that I can use
SELECT SUBSTR(column,2,4) FROM table;
...if I wanted to pull out characters 2, 3, 4, and 5:
1202012101...
1202020120...
0121210212...
^^^^
But I need to extract noncontiguous characters, e.g. characters 1,5,7:
1202012101...
1202020120...
0121210212...
^ ^ ^
I realize this can be done with a query like:
SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;
But this query gets very unwieldy to build for thousands of characters that I need to select. So for the first part of the question - how do I build a query that does something like this:
SELECT CHARACTERS(string,1,5,7) FROM table;
Furthermore, the indices of the characters I want to select are from a different table that looks something like this:
char_index keep_or_discard
1 keep
2 discard
3 discard
4 discard
5 keep
7 discard
8 keep
9 discard
10 discard
So for the second part of the question, how could I build a query to select specific characters from the first table based on whether keep_or_discard="keep" for that character's index in the second table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
此函数执行您想要的操作:
注意:CONVERT('1,2,3,4',signed) 将产生 1,但会发出警告。
我已将其定义为可在数据库测试中使用。
该函数有两个参数;一个带有位置列表的字符串(!),以及一个从您想要提取字符的位置开始的长字符串。
使用此示例:
警告源于函数中完成的向整数的快速转换。 (见上面的评论)
this function does what you want:
Note: the CONVERT('1,2,3,4',signed) will yield 1, but it will give a warning.
I have it defined to be available in the database test.
The function takes two parameters; a string(!) with a list of positions, and a long string from where you want the characters taken.
An example of using this:
The warnings stem from the fast conversion to integer that is done in the function. (See comment above)
动态sql怎么样? (您将需要构建查询的选择部分)
How about dynamic sql? (You will need to build the select part of the query)
您可以编写一个 php 脚本来为您执行此操作:
您将得到一个包含正确数字的数组。
如果您不能/不想使用 PHP 来实现此目的,我很抱歉,我只是不知道如何在没有 PHP、Perl、Python 或其他类似语言的情况下做到这一点。希望这个解决方案能有所帮助......
You can write a php script to do this for you:
And there you have an array with the correct numbers.
I'm sorry if you can't/don't want to use PHP for this, I just don't really know how to do this without PHP, Perl, Python or some other similar language. Hopefully this solution will help somehow...
困难的根源在于您的模式并不代表数据元素之间的真实关系。如果你想用“纯”SQL 来实现这一点,你需要一个更像这样的模式:
然后,你可以执行如下查询:
但是,你可能有充分的理由按照你所拥有的方式构建数据(我的模式需要更多每个字符的存储空间和处理时间也可能比我要建议的要长得多)。
由于 SQL 没有工具来理解嵌入到表中的架构,因此您需要使用用户定义的函数来添加它们。 Kevin 的动态 SQL 示例也可能有效,但根据我的经验,这不如用户定义的函数快。
我在 MS SQL 中做过很多次,但从未在 MySql 中做过。您基本上需要一个用 C 或 C++ 编写的函数,它采用逗号分隔的要提取的索引列表以及要从中提取索引的字符串。然后,该函数将返回这些提取值的逗号分隔列表。请参阅以下链接以获得良好的起点:
http://dev。 mysql.com/doc/refman/5.1/en/adding-functions.html
http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html
要构建要从 char_index 表中提取的索引的串联列表,请尝试group_concat 函数:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
希望这有帮助!
The source of your difficulty is that your schema does not represent the true relationships between the data elements. If you wanted to achieve this with "pure" SQL, you would need a schema more like:
Then, you could perform a query like:
However, you probably have good reasons for structuring your data the way you have (my schema requires much more storage space per character and the processing time is also probably much longer from what I am about to suggest).
Since SQL does not have the tools to understand the schema you have embedded into your table, you will need to add them with a user-defined function. Kevin's example of dynamic SQL may also work, but in my experience this is not as fast as a user-defined function.
I have done this in MS SQL many times, but never in MySql. You basically need a function, written in C or C++, that takes a comma-delimited list of the indexes you want to extract, and the string from which you want to extract them from. Then, the function will return a comma-delimited list of those extracted values. See these links for a good starting point:
http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html
http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html
To build the concatenated list of indexes you want to extract from the char_index table, try the group_concat function:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
Hope this helps!