为一对多表实现查找或插入

发布于 2024-10-18 17:26:21 字数 1953 浏览 0 评论 0原文

我有 2 个表，tracklist 和 track，其中 tracklist 有许多 track。在某些时候，我会收到引用曲目列表的用户输入，并且我需要创建该曲目列表，或者返回现有的曲目列表（这是因为曲目列表对用户完全透明）。

我的天真的解决方案是找到所有包含 n 曲目的曲目列表，然后加入 track 与 tracklist n 次，根据用户输入数据检查每个连接。例如，有 2 个曲目：

SELECT tracklist.id FROM tracklist
  JOIN track t1 ON tracklist.id = t1.tracklist
  JOIN track_name tn1 ON t1.name = tn1.id
  JOIN track t2 ON tracklist.id = t2.tracklist
  JOIN track_name tn2 ON t2.name = tn2.id
 WHERE tracklist.track_count = '20'
   AND (t1.position = 1 AND tn1.name = 'Pancakes' AND t1.artist_credit = '42' AND t1.recording = 1)
   AND (t2.position = 2 AND tn2.name = 'Waffles' AND t2.artist_credit = '9001' AND t2.recording = 2)

但是，这确实不能很好地扩展到大型曲目列表。我的非常基本的计时表明，对于 10 个曲目的曲目列表，这可能需要 >500 毫秒，对于具有 100 个曲目的曲目列表，大约需要 7 秒。虽然后者是一种边缘情况，但我使用的任何算法都需要能够至少扩展到这一点。

然而我仍然坚持其他解决方案。我唯一能想到的另一件事是选择所有具有 n 曲目的曲目列表及其所有曲目，然后在应用程序代码中进行比较。但是，如果可以的话，我真的很想将其保留在数据库服务器上。

这是我正在使用的架构：

CREATE TABLE track
(
    id                  SERIAL,
    recording           INTEGER NOT NULL, -- references recording.id
    tracklist           INTEGER NOT NULL, -- references tracklist.id
    position            INTEGER NOT NULL,
    name                INTEGER NOT NULL, -- references track_name.id
    artist_credit       INTEGER NOT NULL, -- references artist_credit.id
    length              INTEGER CHECK (length IS NULL OR length > 0),
    edits_pending       INTEGER NOT NULL DEFAULT 0,
    last_updated        TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE track_name (
    id                  SERIAL,
    name                VARCHAR NOT NULL
);

CREATE TABLE tracklist
(
    id                  SERIAL,
    track_count         INTEGER NOT NULL DEFAULT 0,
    last_updated        TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

有什么建议吗？

原文

I have 2 tables, tracklist and track, where tracklist has many tracks. At some points, I will receive user input which refers to a list of tracks, and I need to either create that tracklist, or return an existing tracklist (this is because tracklists are meant to be entirely transparent to users).

My naive solution to this was to find all tracklists with n tracks, and join track against tracklist n times, checking each join against the user input data. For example, with 2 tracks:

SELECT tracklist.id FROM tracklist
  JOIN track t1 ON tracklist.id = t1.tracklist
  JOIN track_name tn1 ON t1.name = tn1.id
  JOIN track t2 ON tracklist.id = t2.tracklist
  JOIN track_name tn2 ON t2.name = tn2.id
 WHERE tracklist.track_count = '20'
   AND (t1.position = 1 AND tn1.name = 'Pancakes' AND t1.artist_credit = '42' AND t1.recording = 1)
   AND (t2.position = 2 AND tn2.name = 'Waffles' AND t2.artist_credit = '9001' AND t2.recording = 2)

However, this really doesn't scale well to large tracklists. My very rudimentary timing shows this can take >500ms for 10 track tracklists, and ~7s for tracklists with 100 tracks. While the latter is an edge case, whatever algorithm I use needs to be able to scale at least up to this.

I'm stuck on other solutions however. The only other thing I can think of is to select all tracklists with n tracks, and all their tracks, and then do the comparison in application code. However, I'd really like to keep this on the database server if I can.

Here is the schema I am working with:

CREATE TABLE track
(
    id                  SERIAL,
    recording           INTEGER NOT NULL, -- references recording.id
    tracklist           INTEGER NOT NULL, -- references tracklist.id
    position            INTEGER NOT NULL,
    name                INTEGER NOT NULL, -- references track_name.id
    artist_credit       INTEGER NOT NULL, -- references artist_credit.id
    length              INTEGER CHECK (length IS NULL OR length > 0),
    edits_pending       INTEGER NOT NULL DEFAULT 0,
    last_updated        TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE track_name (
    id                  SERIAL,
    name                VARCHAR NOT NULL
);

CREATE TABLE tracklist
(
    id                  SERIAL,
    track_count         INTEGER NOT NULL DEFAULT 0,
    last_updated        TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Any suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凶凌 2024-10-25 17:26:21

  SELECT DISTINCT tracklist
    FROM track t0
    WHERE 
      (SELECT COUNT(DISTINCT tracklist)
        FROM track t1
          WHERE 
            (
              (t1.id='test1.id')
              OR 
              (t1.id='test2.id')

              ......
              OR 
              (t1.id='testn.id')
            )
      = 1);

  -- This is OK if you have the track ids for this query.
  -- If you do not then you need to replace each of the t1.id='testm.id' statements
  -- with:
  --      t1.recording='testm.recording' AND
  --      t1.tracklist='testm.tracklist' AND
  --      t1.position='testm.position' AND
  --      t1.name='testm.name' AND
  --      t1.artist_credit='testm.artist_credit' AND
  --      t1.length='testm.length' AND
  --      t1.edits_pending='testm.edits_pending' AND
  --      t1.last_updated='testm.last_updated'

由于我的语法可能不完全正确，并且没有机会对其进行测试，因此接下来是我要实现的目标的书面描述：

我建立一个查询，返回您拥有的曲目列表。一旦我构建了这个查询，我就会检查这些曲目的曲目列表是否都相同。如果是，即查询中只有一个曲目列表，那么这就是您需要的曲目列表。如果查询中没有曲目列表，或者有多个曲目列表，则您拥有的曲目集不对应于任何单个现有曲目列表，因此您需要创建一个新的曲目列表。如果有必要，此查询不会处理实际的创建。我不确定它将如何处理退化情况 - 查询中根本没有轨道；或者没有列出任何曲目的曲目列表。

  SELECT DISTINCT tracklist
    FROM track t0
    WHERE 
      (SELECT COUNT(DISTINCT tracklist)
        FROM track t1
          WHERE 
            (
              (t1.id='test1.id')
              OR 
              (t1.id='test2.id')

              ......
              OR 
              (t1.id='testn.id')
            )
      = 1);

  -- This is OK if you have the track ids for this query.
  -- If you do not then you need to replace each of the t1.id='testm.id' statements
  -- with:
  --      t1.recording='testm.recording' AND
  --      t1.tracklist='testm.tracklist' AND
  --      t1.position='testm.position' AND
  --      t1.name='testm.name' AND
  --      t1.artist_credit='testm.artist_credit' AND
  --      t1.length='testm.length' AND
  --      t1.edits_pending='testm.edits_pending' AND
  --      t1.last_updated='testm.last_updated'

As I may not have the syntax exactly correct, and have had no opportunity to test it, a written description of what I am trying to achieve is next:

I build up a query returning the list of tracks that you have. Once I have built this query I am checking whether the tracklists for these tracks are all the same. If they are, ie there is only one tracklist in the query, then this is the tracklist you require. If there are no tracklists in the query, or there is more than one, then the set of tracks you have do not correspond to any single existing tracklist, so you need to create a new tracklist. This query does not deal with the actual creation, if it proves necessary. I am not sure how it will deal with degenerate cases - there are no tracks at all in the query; or there are no tracklists listed for any of the tracks.

回复收藏 0 原文

~没有更多了~