没有schema时如何存储数据?

发布于 2024-11-02 15:03:12 字数 527 浏览 0 评论 0原文

我正在尝试找出在我现在正在启动的项目中正确的数据存储选择是什么。

我想存储powershell脚本输出结果的数据。这意味着我的应用程序的管理员将能够编写一个将在多个主机上执行的 powershell 脚本,并将结果发布回数据存储。然后我想以灵活的方式查询该商店。

让我澄清一下。从 powershell 作业返回的数据不是正确的对象,而是对象属性的键/值集合。所以没有真正的对象可以序列化。

假设我通过 WCF 服务告诉 100 个主机执行两个 powershell 命令 Get-Service 和 Get-Process,然后它们会将结果发回我的数据存储。我事先不知道这些数据的架构。

重点不在于 PowerShell 或 WCF,而在于如何存储在存储架构时未知的数据。之后将根据已存储的数据通过一些 GUI 手动创建查询。

之后我希望能够执行一个查询,例如“获取运行服务 X 并运行进程 Y 的所有主机的列表”?

我正在研究 nosql 数据库作为关系数据库的替代方案,但不确定什么是最好的。

感谢您的任何意见。 /莱纳斯

I'm trying to figure out what is the right choice of data storage in a project I'm starting up right now.

I want to store data that is the output result of powershell scripts. This means that an administrator of my app will be able to write a powershell script that will execute on a number of hosts and they will post the results back to a data store. I then want to query that store in a flexible manner.

Let me clarify. The data that comes back from the powershell job is not a proper object but a key/value collection of properties of objects. So there is no real object to serialize.

Let's say I tell 100 hosts over a WCF service to execute the two powershell commands Get-Service and Get-Process and they will then post back the results to my data store. I don't know the schema of this data beforehand.

The point is not PowerShell nor WCF, but how would you store data that at the time of storing the schema is not known. And querys will be created manually via some GUI afterwards based on the data that has been stored.

Afterwards I would like to be able execute a query like "Get a list of all hosts that have service X running and process Y running" ?

I'm looking into at nosql databases as an alternative to relational DBs but not sure what is best.

Thankful for any input.
/Linus

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

静谧幽蓝 2024-11-09 15:03:12

如果将数据作为 XML 存储到 RDBMS 对您来说没有意义(顺便说一句,为什么没有呢?),那么有几种 NoSQL DB 可能是不错的选择,因为它们是无模式的。

我可以推荐您查看的(根据个人经验,还有许多其他可能相关的)是 CouchDB 和 Riak。两者都提供了磁盘绑定的键值数据存储,您可以将值存储为 JSON,无需预先定义架构。在这两种情况下,都可以使用 Javascript 通过 RESTful 接口查询数据。

选择应取决于您期望的数据量:

  • Riak 设计为在多个节点上运行,查询通过 MapReduce 处理,以便在这些节点之间分布处理,从而为即席查询提供相对快速的数据检索。如果您有大量数据 - 必须运行即席查询的数百万条记录,请选择此选项。尽管我可以证明 Riak 使其相对轻松,但您将因管理集群的复杂性而“付出”代价。
  • CouchDB 设计为在单个节点上运行。复制是可能的(而且很容易),但查询是针对单个服务器运行的。它具有具体化索引,因此针对现有索引的查询运行速度很快。不过,临时查询需要完整的“表扫描”,并且在大型数据集上可能需要几分钟的时间。 OTOH,它具有基于浏览器的良好用户界面,而 Riak 在免费版本中缺乏这一点。

我建议先尝试一下 Couch——它很容易设置并开始使用——然后看看它是否能解决您的问题。如果没有,那就去找Riak。

If storing the data as XML to an RDBMS doesn't make sense to you (btw, why doesn't it?), then there are several NoSQL DBs that would probably be good options because they're schema-less.

The ones I can recommend that you look at (based on personal experience, there are many others that could be relevant) are CouchDB and Riak. Both provide a disk-bound key-value datastore where you store your values as JSON, w/o pre-defining a schema. In both cases it is possible to query the data through a RESTful interface using Javascript.

The choice should depend on the amount of data that you expect:

  • Riak is designed to run on multiple nodes, and queries are handled through MapReduce so that processing is distributed between those nodes, enabling relatively fast data retrieval for ad-hoc queries. If you have lots of data - millions of records that you must run ad-hoc queries, choose this. You'll 'pay' with the added complexity of managing a cluster, though I can attest that Riak makes it relatively painless.
  • CouchDB is designed to run on a single node. Replication is possible (and easy) but queries run against a single server. It has materialized indices, so queries against existing indices run fast. Ad-hoc queries require a full "table scan" though, and could take minutes on large datasets. OTOH, it has the benefit of a nice browser-based user interface that Riak lacks in the free version.

I'd recommend trying Couch out first - it's very easy to set up and start playing with - and see whether it solves your problem. If it doesn't, then go for Riak.

想你只要分分秒秒 2024-11-09 15:03:12

如果您想存储在设计时不知道其结构的数据,您有几种选择。

其中选项包括:

将数据存储为 xml(在数据库或文件中)。

动态创建模式以匹配动态数据的结构。

创建通用结构化架构,其中所有类映射到同一个表,并且所有属性都是动态附加属性。

例如(通用类结构)

GenericClass
{
    GenericProperty[] SimpleProperties;
    Dictionary[string, GenericClass] ComplexProperties;
}

GenericProperty
{
    String Name;
}

StringProperty: GenericProperty
{
    String Value;
}

IntegerProperty: GenericProperty
{
    Integer Value;
}

在这些类上使用每个类型表应该为您提供通用表。

If you want to store data of which you do not know the structure during design time, you have a few options.

Amongst the options are:

Store data as xml (in DB or files).

Create schema dynamically to match dynamic data's structure.

Create a generic structured schema, where all classes map to the same table, and all properties are dynamically attached properties.

E.g. (Generic class structure)

GenericClass
{
    GenericProperty[] SimpleProperties;
    Dictionary[string, GenericClass] ComplexProperties;
}

GenericProperty
{
    String Name;
}

StringProperty: GenericProperty
{
    String Value;
}

IntegerProperty: GenericProperty
{
    Integer Value;
}

Using table-per-type on these classes should give you generic tables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文