一些数据库清理问题

发布于 2024-12-04 18:09:19 字数 432 浏览 6 评论 0原文

我们有一个大约 250k 记录的数据库,我们想要清理这些记录,并且有一些查询我只是不知道如何编写:

*包含子字符串的清晰单词,例如,如果一个单词包含子字符串“cache” ,删除整个单词,例如:

"cachelkjdlkjalkjs here happened something" => "here happend something"

*删除包含超过2位数字的行,但少数情况除外,例如:允许使用3位数字365。

so:

"365 days a year, we do that" => Do nothing
"798 is a random number" => DELETE

*检查字数,并删除字数少于X的记录。

任何帮助将不胜感激。

we have a database of around ~250k records which we want to sanitize, and there are some queries which I just don't know how to write:

*clear words containing a substring, for example, if a word contains the substring "cache", delete the entire words, for example:

"cachelkjdlkjalkjs here happened something" => "here happend something"

*delete rows that include more than 2 digits, with exception of couple of cases, for example: the 3 digits 365 are permitted.

so:

"365 days a year, we do that" => Do nothing
"798 is a random number" => DELETE

*check for number of words, and delete records with less than X number of words.

Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幽梦紫曦~ 2024-12-11 18:09:19

首先备份数据库!

我首先会列出一个单词列表(以及数字 0...99、365 以及您想到的任何其他数字)。然后,我将创建一个脚本(您选择的语言)来浏览行。对于每一行,检索单词、标点符号和数字,然后检查以确保它们有效。对于有效的,重建条目并吐出不匹配的位。从不匹配的部分中,我只是看一下以确保您没有错过任何内容。

我首先会以被动模式(即不更改数据库)执行此操作,直到您满意一切正常为止。

希望有帮助。

First back up the database!

I would first draw up a list of words (along with the numbers 0...99, 365 and any others you think of). I would then create a script (language of yor chosing) to go through the rows. For each row retrieve the words, puncuation, and numbers and then check to ensure that they are valid. For the valid ones reconstruct the entry and spit out the bits that do not match. From the bits that do not match I would just have a look to ensure that you have not missed anything.

I would first do this in a passive mode (i.e. do not change the database) until you a happy that things are ok.

Hope that helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文