收集审计和统计数据

发布于 2024-10-25 06:22:23 字数 305 浏览 1 评论 0原文

我的问题是,我在大型 Web 应用程序中发生了很多事件,我时不时地想看看发生了什么(出于审计目的),或者我想聚合数据以进行统计报告。

一种解决方案是在数据库中为每种类型的事件创建一个表并将其记录在那里。例如,更改密码、记录日期、用户、IP 等。这将为我提供所需的审核信息,并且还能够针对表运行报告以查看此功能的使用频率。缺点是我需要为我想要捕获的每种类型的事件创建一个新表。

我理想的解决方案是拥有一个结构更灵活的表,可能是一个 XML 字段,但我并不热衷于表中的 xml 字段。

所以我的问题是:是否有一种使用良好(流行)的模式可以解决我的问题?

My problem is that I have a lot of events happening in a large web application and now and then I want to see what happened (for auditing purposes) or I want to aggregate the data for statistical reporting.

One solution would be to create a table in the DB for each type of event and log it there. e.g. a password is changed, log the date, user, ip etc. This will provide me with the audit information I need and also the ability to run reports against the table to see how often this functionality is used. The downside is that I would need to create a new table for each type of event that I want to capture.

My ideal solution would be to have a single table with a more flexible structure, perhaps an XML field, but I'm not crazy about the xml field being in the table.

So my question: Is there a well used (popular) pattern that addresses my problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

揽月 2024-11-01 06:22:23

您的大型 Web 应用程序有多大?

将事件记录为 XML blob 应该可以,并且某些数据库(例如 SQL Server)允许您直接查询该 XML。然而,这些查询的性能很糟糕。

在数据库中进行事件日志记录之前,您应该计算出每秒要创建多少条记录。
如果数字很大,则会给数据库带来严重负载,并可能影响整体应用程序性能。
此外,一旦积累了大量记录,查询数据将花费很长时间(并在此过程中杀死数据库性能)。聚合数据的情况更糟——关系数据库在聚合方面效率不高。

克里斯的上述建议对于小型数据库来说效果很好,但无法扩展,因为您的查询必须使用联接。对数据进行非标准化可能会更好。

即使您的应用程序没有获得足够的流量让您现在担心这个问题,请记住,由于上述原因,记录到数据库的事件不会很好地扩展。

具体建议:

如果您没有那么多流量并决定登录到数据库,请对单独的模式执行此操作,这样您就可以更轻松地将其移动到单独的数据库服务器,以便将其卸载您的生产数据库。

如果您决定将事件记录为 xml,请考虑为此目的使用关系数据库是否有意义 - 如果您无法有效地查询,那么简单的日志文件会简单得多。当然,您稍后必须弄清楚如何处理该日志数据,但对于不频繁/简单的查询,使用 grep、awk 等编写一些脚本将花费您相当长的时间。

如今(非常)大型应用程序通常使用的方法是记录到文件,然后使用map-reduce(例如在hadoop 上)运行分析(聚合)。

How large is your large web application?

Logging events as XML blobs should work, and some databases (e.g. SQL Server) let you query that XML directly. However, the performance of these queries is terrible.

Before you do event logging in the database, you should figure out how many records per second you're going to create.
If the number is large it is going to put serious load on your database and could affect your overall application performance.
Also, once you accumulate a large number of records, querying the data would take forever (and kill db performance in the process). Aggregating the data is even worse - relational databases aren't very efficient in doing aggregations.

Chris' suggestion above would work well for small databases, but won't scale since your queries will have to use joins. It may be better to de-normalize your data.

Even if your application isn't getting enough traffic for you to worry about this right now, keep in mind that event logging to the DB won't scale well for the reasons explained above.

Concreate suggestions:

If you don't have that much traffic and decide to log to the DB, do this to a separate schema, so that it'll be easier for you to move it to a separate db server in order to offload it from your production database.

If you decide to log the event as an xml, consider whether there a point in using a relational database for the purpose - if you can't query that efficiently, then simple log files would be much simpler. You'd have to figure out how to process that log data later on of course, but for infrequent / simple queries, writing some scripts using grep, awk, etc. would take you a surprisingly long way.

The method commonly used nowadays by (very) large scale applications is logging to files, then running your analysis (aggregation) using map-reduce, e.g. on hadoop.

叫思念不要吵 2024-11-01 06:22:23

每个事件一个表和一个表之间的中间方式是(假设事件之间的差异是事件携带的参数/数据):

Event Type
  Event Type Id (PK)
  Name
  Number of parameters (useful - not essential)

Event
  Event Id (PK)
  Event Type Id (FK)
  Timestamp

Event Attribute
  Event Attribute Id (PK)
  Event Id (FK)
  Name 
  Value (as string in all cases)
  Sequence Number (within Event. this may well not be needed, but can be a convenience)

我不认为这是一个命名模式,但它是一种重复出现的模式在数据库设计中。

我认为这为您提供了所需的所有信息,而无需存储 XML。

An intermediate way between one table per event and one table is (assuming that the difference between events is the parameters/data carried with the event):

Event Type
  Event Type Id (PK)
  Name
  Number of parameters (useful - not essential)

Event
  Event Id (PK)
  Event Type Id (FK)
  Timestamp

Event Attribute
  Event Attribute Id (PK)
  Event Id (FK)
  Name 
  Value (as string in all cases)
  Sequence Number (within Event. this may well not be needed, but can be a convenience)

I don't think this is a named pattern, but it is a pattern that comes up repeatedly in database design.

I think this gives you all the information you need, without the need to store XML.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文