绘制地图和绘制地图的方法有哪些? 标准化相关数据?

发布于 2024-07-26 04:34:34 字数 904 浏览 6 评论 0原文

假设您需要将提供给您的随机相关数据集中到更简洁的类别中。

示例 - 向您提供以下数据。 注意 - 可能有任意数量的其他相关列式数据:

 Customer      Product                Category
==========    =========    =================================
Customer A    Product A                 Cat 1
 CustomerA    Product B               Category 1
  Cust-A      Product C    Totally Lame & Unrelated Grouping

任务 - 将上述数据合并并规范化为干净的预定义分组:

CustomerA
  Category1
    ProductA
    ProductB
    ProductC

请不要担心如何完成的数据将被持久化。 而是专注于如何坚持和管理分组规则。

只有一个假设:您不能使用数据库来保存分组规则。 因此,当我们说“规范化”时,我们并不是在谈论关系数据库规范化规则。 相反,我们希望消除数据输入中的不一致性(如上所示),以使随机数据进入一致状态。

那么有哪些可用的选项呢? 保持技术不可知论:

XML?

配置文件?

设置文件(编译或未编译)?

.ini 文件?

代码?

等等。

列出优点和缺点。 每个答案的缺点。 尽管这确实是一个练习,但它是一个现实世界的问题。 因此,假设您的客户/雇主已经给您分配了这项任务。

Let's say you need to funnel random, related data given to you into more succinct categories.

Example - You're given the following data. NOTE - There could be any number of other related, columnar data:

 Customer      Product                Category
==========    =========    =================================
Customer A    Product A                 Cat 1
 CustomerA    Product B               Category 1
  Cust-A      Product C    Totally Lame & Unrelated Grouping

Task - Consolidate and normalize the above into clean, pre-defined groupings:

CustomerA
  Category1
    ProductA
    ProductB
    ProductC

Please don't worry about how the finished data will be persisted. But rather focus on how you'll persist and manage the rules for grouping.

Only one assumption: You can't use a database to persist your grouping rules. So when we say "normalize", we're not speaking in terms of relational database normalization rules. But rather we're wanting to remove inconsistencies from data inputs (as seen above) to bring the random data into a consistent state.

So what are the available options? Remain technology agnostic:

XML?

Config files?

Settings file (compiled or not)?

Ini File?

Code?

etc.

List pros & cons for each answer. And though this is indeed an excersize, it's a real-world problem. So assume your client/employer has tasked you with this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

×纯※雪 2024-08-02 04:34:35

这看起来像是一个数据清理练习,完美是几乎不可能的。 问题:

1)。 您可以预先指定类别,还是必须从数据中推断?

2)。 我们可以使用什么规则来接受等效性?

“类别 1”与“类别 1”相同吗? 和“第一类”?

“1 类”。 又名“猫1”? “1 类”怎么样? ? 和“12 号猫”?

只是在挑战中获得一套好的规则。

2)。 您将如何捕捉这些规则? 代码还是配置? 如果配置你会怎么表达? 您最终只是编写一种新的专用编程语言吗?

This seems like a data cleansing exercise, perfection is pretty impossible. Issues:

1). Can you specify up front the categories, or must you deduce from the data?

2). What rules can we use to accept equivalence?

"Cat 1" is the same as "Category 1" ? and "Category one" ?

is

"Cat 1." als "Cat 1"? what about "Cat 1?" ? and "Cat 12" ?

Just getting a good set of rules in a challenge.

2). How would you capture those rules? Code or config? If config how would you express it? Do you end up just writing a new specilaised programming language?

谈情不如逗狗 2024-08-02 04:34:35

这看起来像是一个数据清理练习,完美是几乎不可能的。 问题:

1)。 您可以预先指定类别,还是必须从数据中推断?

2)。 我们可以使用什么规则来接受等效性?

“类别 1”与“类别 1”相同吗? 和“第一类”?

“1 类”。 又名“猫1”? “1 类”怎么样? ? 和“12 号猫”?

只是在挑战中获得一套好的规则。

3)。 您将如何捕捉这些规则? 代码还是配置? 如果配置你会怎么表达? 您最终只是编写一种新的专用编程语言吗?

This seems like a data cleansing exercise, perfection is pretty impossible. Issues:

1). Can you specify up front the categories, or must you deduce from the data?

2). What rules can we use to accept equivalence?

"Cat 1" is the same as "Category 1" ? and "Category one" ?

is

"Cat 1." als "Cat 1"? what about "Cat 1?" ? and "Cat 12" ?

Just getting a good set of rules in a challenge.

3). How would you capture those rules? Code or config? If config how would you express it? Do you end up just writing a new specilaised programming language?

娇俏 2024-08-02 04:34:35
  1. 每个值的字典映射。 '猫1' => '类别 1', '类别 2' => “类别 2”。 这很容易存储,并且不会产生意外后果。 缺点是手动创建所有这些映射是实际工作。
  2. 一系列正则表达式。 这样,您就可以使用相对较少的工作来捕获几乎所有规则。 缺点是正则表达式相对容易“失灵”,并且求值的顺序很重要(即当值匹配多个“规则”时。

至于如何持久化它们?我想不出更无趣的问题了。你只需使用您喜欢的编程语言中最简单的语言。

  1. A dictionary mapping for each value. 'Cat1' => 'Category1', 'Category 2' => 'Category2'. This is easy to store, and has no unintended consequences. The disadvantage is that creating all those mappings by hand is actual work.
  2. A series of regular expressions. That way, you're able to capture nearly all rules using relatively little work. The disadvantage is that regular expressions 'misfire' relatively easily, and the order of evaluation matters (i.e. when values match more than one 'rule'.

As for how to persist them? I can't think of a more uninteresting question. You just use whatever's easiest in your preferred programming language.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文