标签云算法的建议

发布于 2024-09-14 07:06:27 字数 316 浏览 4 评论 0原文

我有一个 MSSQL 2005 表:

[Companies](
    [CompanyID] [int] IDENTITY(1,1) NOT NULL,
    [Title] [nvarchar](128),
    [Description] [nvarchar](256),
    [Keywords] [nvarchar](256)
)

我想为这家公司生成一个标签云。但我已将所有关键字保存在一列中,并用逗号分隔。有关如何通过最常用的关键字生成标签云的任何建议。可能有数百万家公司,每个公司大约有 10 个关键字。

谢谢。

I have a MSSQL 2005 table:

[Companies](
    [CompanyID] [int] IDENTITY(1,1) NOT NULL,
    [Title] [nvarchar](128),
    [Description] [nvarchar](256),
    [Keywords] [nvarchar](256)
)

I want to generate a tag cloud for this companies. But I've saved all keywords in one column separated by commas. Any suggestions for how to generate tag cloud by most used keywords. There could be millions of companies approx ten keywords per company.

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

影子是时光的心 2024-09-21 07:06:27

第一步:将关键词拆分成适当的关系(表)。

CREATE TABLE Keywords (KeywordID int IDENTITY(1,1) NOT NULL
  , Keyword NVARCHAR(256)
  , constraint KeywordsPK primary key (KeywordID)
  , constraint KeywordsUnique unique (Keyword));

步骤 2:将公司和标签之间的多对多关系映射到单独的表中,就像所有多对多关系一样:

CREATE TABLE CompanyKeywords (
   CompanyID int not null
   , KeywordID int not null
   , constraint CompanyKeywords primary key (KeywordID, CompanyID)
   , constraint CompanyKeyword_FK_Companies
      foreign key (CompanyID)
      references Companies(CompanyID)
   , constraint CompanyKeyword_FK_Keywords
      foreign key (KeywordID)
      references Keywords (KeywordID));

步骤 3:使用简单的 GROUP BY 查询来生成“云”(以“云”为例) cloud' 表示最常见的 100 个标签):

with cte as (
SELECT TOP 100 KeywordID, count(*) as Count
FROM CompanyKeywords
group by KeywordID
order by count(*) desc)
select k.Keyword, c.Count
from cte c
join Keyword k on c.KeywordID = k.KeywordID;

第 4 步:缓存结果,因为它很少更改且计算成本很高。

Step 1: separate the keywords into a proper relation (table).

CREATE TABLE Keywords (KeywordID int IDENTITY(1,1) NOT NULL
  , Keyword NVARCHAR(256)
  , constraint KeywordsPK primary key (KeywordID)
  , constraint KeywordsUnique unique (Keyword));

Step 2: Map the many-to-many relation between companies and tags into a separate table, like all many-to-many relations:

CREATE TABLE CompanyKeywords (
   CompanyID int not null
   , KeywordID int not null
   , constraint CompanyKeywords primary key (KeywordID, CompanyID)
   , constraint CompanyKeyword_FK_Companies
      foreign key (CompanyID)
      references Companies(CompanyID)
   , constraint CompanyKeyword_FK_Keywords
      foreign key (KeywordID)
      references Keywords (KeywordID));

Step 3: Use a simple GROUP BY query to generate the 'cloud' (by example taking the 'cloud' to mean the most common 100 tags):

with cte as (
SELECT TOP 100 KeywordID, count(*) as Count
FROM CompanyKeywords
group by KeywordID
order by count(*) desc)
select k.Keyword, c.Count
from cte c
join Keyword k on c.KeywordID = k.KeywordID;

Step 4: cache the result as it changes seldom and it computes expensively.

彼岸花似海 2024-09-21 07:06:27

我更愿意看到您的设计按照Remus的建议进行标准化,但是如果您无法更改设计...

您可以使用解析函数(我将使用的示例取自 此处),解析您的关键字并对其进行计数。

CREATE FUNCTION [dbo].[fnParseStringTSQL] (@string NVARCHAR(MAX),@separator NCHAR(1))
RETURNS @parsedString TABLE (string NVARCHAR(MAX))
AS 
BEGIN
   DECLARE @position int
   SET @position = 1
   SET @string = @string + @separator
   WHILE charindex(@separator,@string,@position) <> 0
      BEGIN
         INSERT into @parsedString
         SELECT substring(@string, @position, charindex(@separator,@string,@position) - @position)
         SET @position = charindex(@separator,@string,@position) + 1
      END
     RETURN
END
go

create table MyTest (
    id int identity,
    keywords nvarchar(256)
)

insert into MyTest
    (keywords)
    select 'sql server,oracle,db2'
    union
    select 'sql server,oracle'
    union
    select 'sql server'

select k.string, COUNT(*) as count
    from MyTest mt
        cross apply dbo.fnParseStringTSQL(mt.keywords,',') k
    group by k.string
    order by count desc

drop function dbo.fnParseStringTSQL
drop table MyTest

I'd much rather see your design normalized as suggested by Remus, but if you're at a point where you can't change your design...

You can use a parsing function (the example I'll use is taken from here), to parse your keywords and count them.

CREATE FUNCTION [dbo].[fnParseStringTSQL] (@string NVARCHAR(MAX),@separator NCHAR(1))
RETURNS @parsedString TABLE (string NVARCHAR(MAX))
AS 
BEGIN
   DECLARE @position int
   SET @position = 1
   SET @string = @string + @separator
   WHILE charindex(@separator,@string,@position) <> 0
      BEGIN
         INSERT into @parsedString
         SELECT substring(@string, @position, charindex(@separator,@string,@position) - @position)
         SET @position = charindex(@separator,@string,@position) + 1
      END
     RETURN
END
go

create table MyTest (
    id int identity,
    keywords nvarchar(256)
)

insert into MyTest
    (keywords)
    select 'sql server,oracle,db2'
    union
    select 'sql server,oracle'
    union
    select 'sql server'

select k.string, COUNT(*) as count
    from MyTest mt
        cross apply dbo.fnParseStringTSQL(mt.keywords,',') k
    group by k.string
    order by count desc

drop function dbo.fnParseStringTSQL
drop table MyTest
乜一 2024-09-21 07:06:27

莱姆斯和乔都是正确的,但正如乔所说,如果你没有选择,那么你就必须忍受它。我想我可以通过使用 XML 数据类型为您提供一个简单的解决方案。您已经可以通过执行此查询轻松查看已解析的列

WITH myCommonTblExp AS (
    SELECT CompanyID,
    CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
    FROM Companies
)
SELECT CompanyID, RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)

,现在知道您可以做到这一点,您所要做的就是对它们进行分组并进行计数,但是您不能对 XML 方法进行分组,因此我的建议是创建上面查询的视图

CREATE VIEW [dbo].[DissectedKeywords]
AS
WITH myCommonTblExp AS (
    SELECT 
    CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
    FROM Companies
)
SELECT RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
GO

并无论如何

SELECT Keywords, COUNT(*) AS KeyWordCount FROM DissectedKeywords
GROUP BY Keywords
ORDER BY Keywords

,这里是完整的文章 -->http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/

Both Remus and Joe are correct but yes as what Joe said if you dont have a choice then you have to live with it. I think I can offer you an easy solution by using an XML Data Type. You can already easily view the parsed column by doing this query

WITH myCommonTblExp AS (
    SELECT CompanyID,
    CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
    FROM Companies
)
SELECT CompanyID, RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)

now knowing that you can do that, all you have to do is to group them and count, but you cannot group XML methods so my suggestion is create a view of the query above

CREATE VIEW [dbo].[DissectedKeywords]
AS
WITH myCommonTblExp AS (
    SELECT 
    CAST('<I>' + REPLACE(Keywords, ',', '</I><I>') + '</I>' AS XML) AS Keywords
    FROM Companies
)
SELECT RTRIM(LTRIM(ExtractedCompanyCode.X.value('.', 'VARCHAR(256)'))) AS Keywords
FROM myCommonTblExp
CROSS APPLY Keywords.nodes('//I') ExtractedCompanyCode(X)
GO

and perform your count on that view

SELECT Keywords, COUNT(*) AS KeyWordCount FROM DissectedKeywords
GROUP BY Keywords
ORDER BY Keywords

Anyways here is the full article -->http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文