独立于 DBMS 的查询

发布于 2024-10-18 17:32:33 字数 1918 浏览 1 评论 0原文

我的硕士论文是关于通过分析元数据和存储的数据来发现糟糕的数据库设计。我们通过从给定的 DBMS 中提取元数据模型,然后对此元数据运行一组规则来实现此目的。

为了通过数据分析来扩展这个过程,我们需要允许规则直接查询数据库,但我们必须保留DBMS的独立性,这样查询就可以应用于PostgreSQL、MSSQL和MySQL。

我们已经讨论了一种查询的功能构造,例如:

new Query(new Select(columnID), new From(tableID), new Where(new Equality(columnID1, columnID2)))

然后使用特定于 DBMS 的序列化器。

另一种方法是让规则自己处理这一切:

public Query QueryDatabase(DBMS dbms)
{
 if (dbms == PostgreSQL) { return "select count(1) from Users"}
 if (dbms == MSSQL) {return ....}
}

我们是否遗漏了什么?所有这些实际上都存在于某个不错的图书馆中吗?是的,我们已经研究过实体框架,但它们似乎依赖于数据库的静态类型模型,而由于显而易见的原因无法创建该模型。

我应该提到的是,我们维护一个可扩展的规则架构,允许最终用户实现自己的规则。

为了阐明我们想要实现的目标,请看下面的查询(mssql),它需要两个参数,表的名称(@table)和列的名称(@column):

DECLARE @TotalCount FLOAT;
SELECT @TotalCount = COUNT(1) FROM [@table];
SELECT SUM(pcount * LOG10(@TotalCount / pcount)) / (LOG10(2) * @TotalCount)  
FROM (SELECT (Count([@column])) as pcount 
      FROM [@table]
      GROUP BY [@column])  as exp1 

该查询测量存储的信息量在给定的属性中,通过估计熵。它需要访问表中的所有行。为了避免从数据库中提取所有行并通过慢速网络连接传输它们,最好用 SQL 表达它们,并且仅传输单个数字。

注意:我们确实拥有我们需要的所有元数据。此问题仅用于访问数据!

我不太确定是否要将其添加到我已经很长的问题中,编辑现有答案或做什么。请随时提供建议。 ;)

基于 mrnye 答案:

new Query()
.Variable(varname => FLOAT)
.Set(varname => new Query().Count(1).From(table) )
.Select(new Aggregate().Sum(varname => "pcount * LOG10(varname / pcount)"))
.From(
  new Query()
  .Select(pcount => new Aggregate().Count(column)
  .From(table)
  .GroupBy(column)
)

除了语法错误和 lambda 语句的误用之外,我还考虑了使用一些扩展方法来构建查询的想法。这看起来确实是一个相当复杂的方法。您如何看待这种方法?

建立在 LINQ 答案的基础上:

let totalCount = Table.Count
from uv un from r in Table
           group r by r["attr"]
           select r.Count
select r.Count * Log2((totalCount / r.Count))

看起来相当不错,但要实现的工作量很大......

My masters thesis is about discovering bad database design by analyzing metadata and the data stored. We do this by extracting a metadata model from a given DBMS and then running a set of rules on this metadata.

To extend this process with data analysis, we need to allow rules to query the database directly, but we must retain DBMS independence, such that queries can be applied to PostgreSQL, MSSQL and MySQL.

We have discussed a sort of functional construction of queries such as:

new Query(new Select(columnID), new From(tableID), new Where(new Equality(columnID1, columnID2)))

And then using a DBMS-specific serializer.

Another approach is to let rules handle it all by themselves:

public Query QueryDatabase(DBMS dbms)
{
 if (dbms == PostgreSQL) { return "select count(1) from Users"}
 if (dbms == MSSQL) {return ....}
}

Are we missing something? Does all this in fact exist in a nice library somewhere? And yes, we have looked at Entity frameworks, but they seem to rely on a statically types model of the database, which for obvious reasons cannot be created.

I should mention that we maintain an extensible rule architecture, allowing end users to implement their own rules.

To clarify what we want to achieve, look at the following query (mssql), it needs two parameters, the name of the table (@table) and the name of the column (@column):

DECLARE @TotalCount FLOAT;
SELECT @TotalCount = COUNT(1) FROM [@table];
SELECT SUM(pcount * LOG10(@TotalCount / pcount)) / (LOG10(2) * @TotalCount)  
FROM (SELECT (Count([@column])) as pcount 
      FROM [@table]
      GROUP BY [@column])  as exp1 

The query measures the amount of information stored in a given attribute, by estimating the entropy. It needs to access all rows in the table. To avoid extracting all rows from the database and transferring them over a slow network connection it is better to express them in SQL an only transfer a single number.

NOTE: We DO have all the metadata we need. This question is only for accessing data!

I was not very sure of whether to add this to my already long question, edit an existing answer or what todo. Please feel free to advise. ;)

Building on mrnye answer:

new Query()
.Variable(varname => FLOAT)
.Set(varname => new Query().Count(1).From(table) )
.Select(new Aggregate().Sum(varname => "pcount * LOG10(varname / pcount)"))
.From(
  new Query()
  .Select(pcount => new Aggregate().Count(column)
  .From(table)
  .GroupBy(column)
)

Syntax errors and misuse of lambda statements aside, i played with the idea of using some extension methods for building queries. It does seem as a fairly complex approach. How would you think about such an approach?

Building on the LINQ answer:

let totalCount = Table.Count
from uv un from r in Table
           group r by r["attr"]
           select r.Count
select r.Count * Log2((totalCount / r.Count))

Seems fairly nice, but a helluva lot to implement...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

青春有你 2024-10-25 17:32:33

您可以通过实现自定义 LINQ 提供程序来实现相同的效果基础设施。查询是通用的,但是 AST 树访问者 生成 SQL 查询的工具可以是可插入的。您甚至可以使用内存数据存储来模拟数据库,并将自定义 LINQ 查询转换为 LINQ to 对象查询!

您需要创建一个知道如何从对象的索引器中提取列名的提供程序。这是一个可以扩展的基本框架:

// Runs in LinqPad!
public class TableQueryObject
{
    private readonly Dictionary<string, object> _data = new Dictionary<string, object>();
    public string TableName { get; set; }
    public object this[string column]
    {
        get { return _data.ContainsKey(column) ? _data[column] : null; }
        set { if (_data.ContainsKey(column)) _data[column] = value; else _data.Add(column, value); }
    }
}

public interface ITableQuery : IEnumerable<TableQueryObject>
{
    string TableName { get; }
    string ConnectionString { get; }
    Expression Expression { get; }
    ITableQueryProvider Provider { get; }
}

public interface ITableQueryProvider
{
    ITableQuery Query { get; }
    IEnumerable<TableQueryObject> Execute(Expression expression);
}

public interface ITableQueryFactory
{
    ITableQuery Query(string tableName);
}


public static class ExtensionMethods
{
    class TableQueryContext : ITableQuery
    {
        private readonly ITableQueryProvider _queryProvider;
        private readonly Expression _expression;

        public TableQueryContext(ITableQueryProvider queryProvider, Expression expression)
        {
            _queryProvider = queryProvider;
            _expression = expression;
        }

        public string TableName { get { return _queryProvider.Query.TableName; } }
        public string ConnectionString { get { return _queryProvider.Query.ConnectionString; } }
        public Expression Expression { get { return _expression; } }
        public ITableQueryProvider Provider { get { return _queryProvider; } }
        public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
        IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
    }

    public static MethodInfo MakeGeneric(MethodBase method, params Type[] parameters)
    {
        return ((MethodInfo)method).MakeGenericMethod(parameters);
    }

    public static Expression StaticCall(MethodInfo method, params Expression[] expressions)
    {
        return Expression.Call(null, method, expressions);
    }

    public static ITableQuery CreateQuery(this ITableQueryProvider source, Expression expression)
    {
        return new TableQueryContext(source, expression);
    }

    public static IEnumerable<TableQueryObject> Select<TSource>(this ITableQuery source, Expression<Func<TSource, TableQueryObject>> selector)
    {
        return source.Provider.CreateQuery(StaticCall(MakeGeneric(MethodBase.GetCurrentMethod(), typeof(TSource)), source.Expression, Expression.Quote(selector)));
    }

    public static ITableQuery Where(this ITableQuery source, Expression<Func<TableQueryObject, bool>> predicate)
    {
        return source.Provider.CreateQuery(StaticCall((MethodInfo)MethodBase.GetCurrentMethod(), source.Expression, Expression.Quote(predicate)));
    }
}

class SqlTableQueryFactory : ITableQueryFactory
{

    class SqlTableQuery : ITableQuery
    {
        private readonly string _tableName;
        private readonly string _connectionString;
        private readonly ITableQueryProvider _provider;
        private readonly Expression _expression;

        public SqlTableQuery(string tableName, string connectionString)
        {
            _connectionString = connectionString;
            _tableName = tableName;
            _provider = new SqlTableQueryProvider(this);
            _expression = Expression.Constant(this);
        }

        public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
        IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
        public string TableName { get { return _tableName; } }
        public string ConnectionString { get { return _connectionString; } }
        public Expression Expression { get { return _expression; } }
        public ITableQueryProvider Provider { get { return _provider; } }
    }

    class SqlTableQueryProvider : ITableQueryProvider
    {
        private readonly ITableQuery _query;
        public ITableQuery Query { get { return _query; } }
        public SqlTableQueryProvider(ITableQuery query) { _query = query; }

        public IEnumerable<TableQueryObject> Execute(Expression expression)
        {
            //var connecitonString = _query.ConnectionString;
            //var tableName = _query.TableName;
            // TODO visit expression AST (generate any sql dialect you want) and execute resulting sql
                    // NOTE of course the query can be easily parameterized!
            // NOTE here the fun begins, just return some dummy data for now :)
            for (int i = 0; i < 100; i++)
            {
                var obj = new TableQueryObject();
                obj["a"] = i;
                obj["b"] = "blah " + i;
                yield return obj;
            }
        }
    }

    private readonly string _connectionString;
    public SqlTableQueryFactory(string connectionString) { _connectionString = connectionString; }
    public ITableQuery Query(string tableName)
    {
        return new SqlTableQuery(tableName, _connectionString);
    }
}

static void Main()
{
    ITableQueryFactory database = new SqlTableQueryFactory("SomeConnectionString");
    var result = from row in database.Query("myTbl")
                 where row["someColumn"] == "1" && row["otherColumn"] == "2"
                 where row["thirdColumn"] == "2" && row["otherColumn"] == "4"
                 select row["a"]; // NOTE select executes as linq to objects! FTW
    foreach(var a in result) 
    {
        Console.WriteLine(a);
    }   
}

You could achieve the same by implementing a custom LINQ provider infrastructure. The queries are generic, but the AST tree visitors that generate the SQL queries can be made pluggable. You can even mock a database using a in memory data store and translating your custom LINQ query to a LINQ to objects query!

You would need to create a provider that would know how to extract the column name from the object's indexer. Here is a basic framework that you can extend:

// Runs in LinqPad!
public class TableQueryObject
{
    private readonly Dictionary<string, object> _data = new Dictionary<string, object>();
    public string TableName { get; set; }
    public object this[string column]
    {
        get { return _data.ContainsKey(column) ? _data[column] : null; }
        set { if (_data.ContainsKey(column)) _data[column] = value; else _data.Add(column, value); }
    }
}

public interface ITableQuery : IEnumerable<TableQueryObject>
{
    string TableName { get; }
    string ConnectionString { get; }
    Expression Expression { get; }
    ITableQueryProvider Provider { get; }
}

public interface ITableQueryProvider
{
    ITableQuery Query { get; }
    IEnumerable<TableQueryObject> Execute(Expression expression);
}

public interface ITableQueryFactory
{
    ITableQuery Query(string tableName);
}


public static class ExtensionMethods
{
    class TableQueryContext : ITableQuery
    {
        private readonly ITableQueryProvider _queryProvider;
        private readonly Expression _expression;

        public TableQueryContext(ITableQueryProvider queryProvider, Expression expression)
        {
            _queryProvider = queryProvider;
            _expression = expression;
        }

        public string TableName { get { return _queryProvider.Query.TableName; } }
        public string ConnectionString { get { return _queryProvider.Query.ConnectionString; } }
        public Expression Expression { get { return _expression; } }
        public ITableQueryProvider Provider { get { return _queryProvider; } }
        public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
        IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
    }

    public static MethodInfo MakeGeneric(MethodBase method, params Type[] parameters)
    {
        return ((MethodInfo)method).MakeGenericMethod(parameters);
    }

    public static Expression StaticCall(MethodInfo method, params Expression[] expressions)
    {
        return Expression.Call(null, method, expressions);
    }

    public static ITableQuery CreateQuery(this ITableQueryProvider source, Expression expression)
    {
        return new TableQueryContext(source, expression);
    }

    public static IEnumerable<TableQueryObject> Select<TSource>(this ITableQuery source, Expression<Func<TSource, TableQueryObject>> selector)
    {
        return source.Provider.CreateQuery(StaticCall(MakeGeneric(MethodBase.GetCurrentMethod(), typeof(TSource)), source.Expression, Expression.Quote(selector)));
    }

    public static ITableQuery Where(this ITableQuery source, Expression<Func<TableQueryObject, bool>> predicate)
    {
        return source.Provider.CreateQuery(StaticCall((MethodInfo)MethodBase.GetCurrentMethod(), source.Expression, Expression.Quote(predicate)));
    }
}

class SqlTableQueryFactory : ITableQueryFactory
{

    class SqlTableQuery : ITableQuery
    {
        private readonly string _tableName;
        private readonly string _connectionString;
        private readonly ITableQueryProvider _provider;
        private readonly Expression _expression;

        public SqlTableQuery(string tableName, string connectionString)
        {
            _connectionString = connectionString;
            _tableName = tableName;
            _provider = new SqlTableQueryProvider(this);
            _expression = Expression.Constant(this);
        }

        public IEnumerator<TableQueryObject> GetEnumerator() { return Provider.Execute(Expression).GetEnumerator(); }
        IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
        public string TableName { get { return _tableName; } }
        public string ConnectionString { get { return _connectionString; } }
        public Expression Expression { get { return _expression; } }
        public ITableQueryProvider Provider { get { return _provider; } }
    }

    class SqlTableQueryProvider : ITableQueryProvider
    {
        private readonly ITableQuery _query;
        public ITableQuery Query { get { return _query; } }
        public SqlTableQueryProvider(ITableQuery query) { _query = query; }

        public IEnumerable<TableQueryObject> Execute(Expression expression)
        {
            //var connecitonString = _query.ConnectionString;
            //var tableName = _query.TableName;
            // TODO visit expression AST (generate any sql dialect you want) and execute resulting sql
                    // NOTE of course the query can be easily parameterized!
            // NOTE here the fun begins, just return some dummy data for now :)
            for (int i = 0; i < 100; i++)
            {
                var obj = new TableQueryObject();
                obj["a"] = i;
                obj["b"] = "blah " + i;
                yield return obj;
            }
        }
    }

    private readonly string _connectionString;
    public SqlTableQueryFactory(string connectionString) { _connectionString = connectionString; }
    public ITableQuery Query(string tableName)
    {
        return new SqlTableQuery(tableName, _connectionString);
    }
}

static void Main()
{
    ITableQueryFactory database = new SqlTableQueryFactory("SomeConnectionString");
    var result = from row in database.Query("myTbl")
                 where row["someColumn"] == "1" && row["otherColumn"] == "2"
                 where row["thirdColumn"] == "2" && row["otherColumn"] == "4"
                 select row["a"]; // NOTE select executes as linq to objects! FTW
    foreach(var a in result) 
    {
        Console.WriteLine(a);
    }   
}
貪欢 2024-10-25 17:32:33

我认为 LINQ 路线是可行的方法,但为了好玩,我尝试想出一个解决方案。它需要一些工作,但总体思路是让查询接口流畅并将实现逻辑隐藏在接口后面。只是把它扔在那里作为思考的食物......

public interface IDBImplementation
{
  public void ProcessQuery(Select query);
}

public class SqlServerImplementation : IDBImplementation
{
  public void ProcessQuery(Select query)
  {
    string sqlQuery = "SELECT " + String.Join(", ", query.Columns)
      + " FROM " + query.TableName + " WHERE " + String.Join(" AND ", query.Conditions);
    // execute query...
  }
}

public class Select
{
  public Select(params string[] columns)
  {
    Columns = columns;
  }

  public string[] Columns { get; set; }
  public string TableName { get; set; }
  public string[] Conditions { get; set; }
}

public static class Extensions
{
  public static Select From(this Select select, string tableName)
  {
    select.TableName = tableName;
    return select;
  }

  public static Select Where(this Select select, params string[] conditions)
  {
    select.Conditions = conditions;
    return select;
  }
}

public static class Main
{
  public static void Example()
  {
    IDBImplementation database = new SqlServerImplementation();

    var query = new Select("a", "b", "c").From("test").Where("c>5", "b<10");

    database.ProcessQuery(query);
  }
}

I think the LINQ route is the way to go, but for fun I tried to think of a solution. It needs some work, but the general idea is to have the query interface fluent and hide the implementation logic behind interfaces. Just throwing it out there as food for thought...

public interface IDBImplementation
{
  public void ProcessQuery(Select query);
}

public class SqlServerImplementation : IDBImplementation
{
  public void ProcessQuery(Select query)
  {
    string sqlQuery = "SELECT " + String.Join(", ", query.Columns)
      + " FROM " + query.TableName + " WHERE " + String.Join(" AND ", query.Conditions);
    // execute query...
  }
}

public class Select
{
  public Select(params string[] columns)
  {
    Columns = columns;
  }

  public string[] Columns { get; set; }
  public string TableName { get; set; }
  public string[] Conditions { get; set; }
}

public static class Extensions
{
  public static Select From(this Select select, string tableName)
  {
    select.TableName = tableName;
    return select;
  }

  public static Select Where(this Select select, params string[] conditions)
  {
    select.Conditions = conditions;
    return select;
  }
}

public static class Main
{
  public static void Example()
  {
    IDBImplementation database = new SqlServerImplementation();

    var query = new Select("a", "b", "c").From("test").Where("c>5", "b<10");

    database.ProcessQuery(query);
  }
}
月隐月明月朦胧 2024-10-25 17:32:33

检索数据库信息的最独立于 DBMS 的方法是 iNFORMATION_SCHEMA。请参阅 MySQLSQL ServerPostgreSQL

我有点好奇你在想什么类型的规则。 :)

The most DBMS-independent way to retrieve information about a database is iNFORMATION_SCHEMA. See MySQL, SQL Server, PostgreSQL.

I'm kind of curious what type of rules you're thinking of. :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文