如何使用新的主键扩展SQL表以及添加值以供向键的值？

发布于 2025-01-25 15:23:47 字数 1297 浏览 3 评论 0 原文

我想加入或更新以下两个表，还为现有单词添加 df 。因此，如果第一个表中不存在 eardeavor ，则应使用其 df 值添加它表 DF 应求和。

fyi我正在使用Mariadb和Pyspark在文档上进行单词计数并计算TF，DF和TFIDF值。

表名称：DF

+--------+----+
|    word|  df|
+--------+----+
|vicinity|   5|
|   hallo|   2|
|  admire|   3|
| settled|   1|
+--------+----+

表名：word_list：word_list

|      word| df|
+----------+---+
|     hallo|  1|
|   settled|  1|
|  endeavor|  1|
+----------+---+

so s o the en en the en the更新/组合的表应该看起来这样：

|      word| df|
+----------+---+
|  vicinity|  5|
|     hallo|  3|
|    admire|  3|
|   settled|  2|
|  endeavor|  1|
+----------+---+

到目前为止，我尝试做的是以下内容：

SELECT df.word, df.df + word_list.df FROM df FULL OUTER JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df FULL OUTER JOIN word_list ON df.word=word_list.word

它们都没有工作，我要么获得一个只有 null 值，一些 null 值，要么一些例外。我敢肯定，必须有一个简单的SQL语句才能实现这一目标，但是我一直坚持使用了几个小时，并且还没有在堆栈溢出中找到任何相关的东西。

原文

I want to join or update the following two tables and also add up df for existing words. So if the word endeavor does not exist in the first table, it should be added with its df value or if the word hello exists in both tables df should be summed up.

FYI I'm using MariaDB and PySpark to do word counts on documents and calculate tf, df, and tfidf values.

Table name: df

+--------+----+
|    word|  df|
+--------+----+
|vicinity|   5|
|   hallo|   2|
|  admire|   3|
| settled|   1|
+--------+----+

Table name: word_list

|      word| df|
+----------+---+
|     hallo|  1|
|   settled|  1|
|  endeavor|  1|
+----------+---+

So in the end the updated/combined table should look like this:

|      word| df|
+----------+---+
|  vicinity|  5|
|     hallo|  3|
|    admire|  3|
|   settled|  2|
|  endeavor|  1|
+----------+---+

What I've tried to do so far is the following:

SELECT df.word, df.df + word_list.df FROM df FULL OUTER JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df JOIN word_list ON df.word=word_list.word
SELECT df.word FROM df FULL OUTER JOIN word_list ON df.word=word_list.word

None of them worked, I either get a table with just null values, some null values, or some exception. I'm sure there must be an easy SQL statement to achieve this but I've been stuck with this for hours and also haven't found anything relatable on stack overflow.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁与争疯 2025-02-01 15:23:47

您只需要首先将两张桌子联合，然后在单词上汇总。由于桌子的结构相同，因此非常容易。看这个小提琴。由于您没有指定，我使用了Maria 10.3，但是这些查询应该完全符合（仅）任何DBM。

select word, sum(df) as df
from (
  select * from df 
  UNION ALL
  select * from word_list
) z
group by word
order by sum(df) desc;

联合是联接的垂直表亲，即联合加入数据集或垂直或行方面的数据集，并将它们水平添加，即通过将列添加到输出中。两个数据集都需要具有相同数量的列来使联盟工作，并且您需要在此处使用联合，以使联盟返回所有行，因为默认行为是返回唯一的行。在此数据集中，由于Settled在两个表中的值为1，因此如果您不使用所有关键字，则仅在联合中有一个条目，因此，当您执行总和时，DF的值将为1，而不是1 2，正如您期望的那样。

如果您只是转到新表格，则订单不需要。我只是添加了它以与样本输出相同的顺序获得结果。

让我知道这是否对您有用。

You just need to UNION the two tables first, then aggregate on the word. Since the tables are identically structured it's very easy. Look at this fiddle. I have used maria 10.3 since you didn't specify, but these queries should be completely compliant with (just about) any DBMS.

https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=c6d86af77f19fc1f337ad1140ef07cd2

select word, sum(df) as df
from (
  select * from df 
  UNION ALL
  select * from word_list
) z
group by word
order by sum(df) desc;

UNION is the vertical cousin of JOIN, that is, UNION joins to datasets vertically or row-wise, and JOIN adds them horizontally, that is by adding columns to the output. Both datasets need to have the same number of columns for the UNION to work, and you need to use UNION ALL here so that the union returns all rows, because the default behavior is to return unique rows. In this dataset, since settled has a value of 1 in both tables, it would only have one entry in the UNION if you don't use the ALL keyword, and so when you do the sum the value of df would be 1 instead of 2, as you are expecting.

The ORDER BY isn't necessary if you are just transferring to a new table. I just added it to get my results in the same order as your sample output.

Let me know if this worked for you.

回复收藏 0 原文

~没有更多了~