以编程方式提取 RDBMS 中表之间的关系(无需外键)?

发布于 2024-07-13 21:54:03 字数 389 浏览 6 评论 0原文

我正在对 Oracle 数据库中中等规模的表(50+)之间的关系进行逆向工程,其中表之间没有定义外键。 我(在某种程度上)可以指望能够跨表匹配列名。 例如,列名“SomeDescriptiveName”在整个表集中可能是相同的。

我希望能够做的是找到一种更好的方法来根据那些匹配的列名称提取一些关系集,而不是手动逐一浏览表。 我可以使用 Java DatabaseMetaData 方法做一些事情,但这似乎是以前有人可能必须编写脚本的任务之一。 也许使用 Perl 或其他脚本语言提取列名称,使用列名称作为散列键并将表添加到散列键指向的数组中?

有人有任何提示或建议可以使这变得更简单或提供一个良好的起点吗? 这是一个丑陋的需求,如果已经定义了外键,那么理解关系就会容易得多。

谢谢。

I'm reverse engineering the relationships between a medium-sized number of tables (50+) in an Oracle database where there are no foreign keys defined between the tables. I can count (somewhat) on being able to match column names across tables. For example, column name "SomeDescriptiveName" is probably the same across the set of tables.

What I would like to be able to do is to find a better way of extracting some set of relationships based on those matching column names than manually going through the tables one by one. I could do something with Java DatabaseMetaData methods but it seems like this is one of those tasks that someone has probably had to script before. Maybe extract the columns names with Perl or some other scripting lang, use the column names as a hash key and add tables to an array pointed to by the hash key?

Anyone have any tips or suggestions that might make this simpler or provide a good starting point? It's an ugly need, if foreign keys had already been defined, understanding the relationships would have been much easier.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烟沫凡尘 2024-07-20 21:54:03

您几乎已经在问题中写下了答案。

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}

You pretty much wrote the answer in your question.

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}
燃情 2024-07-20 21:54:03

我的策略是使用 Oracle 系统目录查找列名称数据类型相同但表名称不同的列。 另外,哪一列是表的主键或唯一键的一部分。

这是一个可能接近执行此操作的查询,但我没有方便的 Oracle 实例来测试它:

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

当然,这不会得到任何相关但具有不同名称的列的情况。

My strategy would be to use the Oracle system catalog to find columns that are the same in column name and data type but different in table name. Also which one of the columns is part of a table's primary or unique key.

Here's a query that may be close to doing this, but I don't have an Oracle instance handy to test it:

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

Of course this won't get any case of columns that are related but have different names.

小ぇ时光︴ 2024-07-20 21:54:03

您可以结合使用三种(或四种)方法,具体取决于架构的混淆程度:

  • 动态方法
    • 观察
      • 在 RDBMS(或 ODBC 层)中启用跟踪,然后
      • 在应用程序中执行各种活动(最好是创建记录),然后
      • 确定哪些表按照严格的顺序进行了更改,以及哪些列值对
      • 在序列间隔期间出现在多个列中的值可能表示外键关系
  • 静态方法(只需分析现有数据,不需要运行应用程序)
    • 命名法:尝试从列名称推断关系
    • 统计:查看所有数字列中唯一值的最小值/最大值(可能还包括平均值),并尝试执行匹配
    • 代码逆向工程:最后的手段(除非处理脚本)- 不适合胆小的人:)

You can use a combination of three (or four) approaches, depending on how obfuscated the schema is:

  • dynamic methods
    • observation:
      • enable tracing in the RDBMS (or ODBC layer), then
      • perform various activities in the application (ideally record creation), then
      • identify which tables were altered in tight sequence, and with what column-value pairs
      • values occurring in more than one column during the sequence interval may indicate a foreign key relationship
  • static methods (just analyzing existing data, no need to have a running application)
    • nomenclature: try to infer relationships from column names
    • statistical: look at minimum/maximum (and possibly the average) of unique values in all numerical columns, and attempt to perform a match
    • code reverse engineering: your last resort (unless dealing with scripts) - not for the faint of heart :)
从此见与不见 2024-07-20 21:54:03

这是个有趣的问题。 我采取的方法是强力搜索与小样本集的类型和值匹配的列。 您可能需要调整启发式方法才能为您的模式提供良好的结果。 我在不使用自动递增键的模式上运行了它,并且运行良好。 该代码是为 MySQL 编写的,但很容易适应 Oracle。

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}

This is an interesting question. The approach I took was a brute force search for columns that matched types and values for a small sample set. You'll probably have to tweak the heuristics to provide good results for your schema. I ran this on a schema that didn't use auto-incremented keys and it worked well. The code is written for MySQL, but it's very easy to adapt to Oracle.

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文