当前位置：文江博客话题详情

使 MySQL 表唯一

发布于 2024-08-06 20:26:08 字数 131 浏览 5 评论 0原文

嘿，我创建了一个蜘蛛来爬行 PDF 文档，并将文档中的每个单词记录到 MySQL 数据库的表中。

显然，像“the”、“and”、“or”等词在一本书中出现很多很多次。

我只是想知道从表中删除重复值的最快方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对不⑦ 2024-08-13 20:26:08

创建一个不索引单词的表，并使用批量插入放入书中的所有单词（您也可以使用 LOAD DATA）。完成插入后，在 word 字段上添加新索引，

然后使用以下命令创建第二个表：

CREATE TABLE newTable SELECT DISTINCT word FROM oldTable

Create a table without indexing the words and put in all the words from the book using mass inserts (you could also use LOAD DATA). When you're done with insertions, add a new Index on the word field

Then create a second table using:

CREATE TABLE newTable SELECT DISTINCT word FROM oldTable

回复收藏 0 原文

甜妞爱困 2024-08-13 20:26:08

您可以确保没有重复项进入表中，而不是删除重复项。

假设您的表只有 2 个字段，id 和 word：

INSERT INTO table SELECT null, 'word' FROM table WHERE NOT EXISTS (SELECT * FROM table WHERE word = 'word') LIMIT 1;

仅当单词尚不存在时，才会将单词插入表中

Instead of removing duplicates, you could make sure that no duplicates ever make it into the table.

Presuming your table has only 2 fields, id and word:

INSERT INTO table SELECT null, 'word' FROM table WHERE NOT EXISTS (SELECT * FROM table WHERE word = 'word') LIMIT 1;

This will insert the word into the table only if it's not already in there

回复收藏 0 原文

背叛残局 2024-08-13 20:26:08

如果您可以重新运行脚本来填充数据库，则可以在“word”字段上添加唯一键，而不是 INSERT INTO 执行 REPLACE INTO。这将在添加重复字段之前删除记录的先前实例。这可能不是最有效的方法，但它相当简单。有关更多详细信息，请参阅此处：

http://dev.mysql.com/ doc/refman/5.0/en/replace.html

回复收藏 0 原文

窝囊感情。 2024-08-13 20:26:08

选择单词字段上的不同，然后删除具有不同 id 的所有行？我不是子查询方面的高手，所以没有 atm 的例子:)

回复收藏 0 原文

别忘他 2024-08-13 20:26:08

delete from words where idcolumn not in
  (select min(idcolumn) 
   from words T2 
   where T2.plain = WordsTable.plain)

如果您为找到的每个单词添加了 (idcolumn, plain)，则此方法有效。

如果您没有 id 列 (pk)，那么您可以使用 Anax 的解决方案。

除了不插入重复项（codeburger 注释）之外，您还可以在普通列上设置唯一索引。

delete from words where idcolumn not in
  (select min(idcolumn) 
   from words T2 
   where T2.plain = WordsTable.plain)

This works if you added (idcolumn, plain) for every word you found.

If you do not have an id column (pk) then you can use Anax's solution.

In addition to not inserting duplicates (codeburger comment), you can just set a unique index on your plain column.

回复收藏 0 原文

~没有更多了~

关于作者

弃爱

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

使 MySQL 表唯一

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

使 MySQL 表唯一

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。