考虑使用现有数据（不是数据库，实际数据）的新 Rails 应用程序——最好的继续方法是什么？

发布于 2024-07-12 13:12:09 字数 1132 浏览 7 评论 0原文

我的任务是为当前的工作开发一个新的零售电子商务店面，我正在考虑通过 RoR 来解决这个问题：A) 用我有限的 Rails 知识构建一个“真正的”项目，B) 为管理层提供快速周转和反馈（他们希望尽快完成这项工作，但他们的最后期限相当不现实 - 我说的是几周的时间从无到有，这样他们就可以开始通过 SEO/SEM 进行营销，我没有骗你，“视频博客”，因为我的老板听说那是未来）。

我们确实有一个数据库结构，但它绝对很糟糕，而且毫无规律地拼凑在一起，所以我将在很大程度上忽略它并从头开始创建一个新数据库；但是，我有需要加载到应用程序中的现有数据（就像我说的，这是一个电子商务应用程序，我们有产品数据）。我需要将这些数据整理成可用的格式，因为我们的供应商向我们提供了神秘的缩写列名，并且它是高度非规范化的，特别是在类别中（我之前发布了一个关于它的问题 - 基本上类别表有六个字段，每个类别/子类别一个，如果该类别不适用，其中一些为空）。

有两个主要问题让我重新考虑：

正如我所说，数据需要放入“正确的”数据库模式中；我不能按原样加载它。我对一个好的数据模型有一些想法，但我的分析尚未完成。最终将有大量的连接表将各种事物链接在一起（例如产品类别、产品属性、产品价格）等，并且这些表不是通过 ID 而是通过其 SKU 链接产品（见下文）。
所有内容都已经有一个为其生成的 ID，但我添加的任何新内容都需要自动生成；我怀疑这对于任何成熟的 RDBMS 来说都是一个问题，但我知道 Rails 喜欢自己生成 ID。此外，几乎所有与产品相关的表都是通过 SKU 链接的（供应商提供的数据中实际上是由前缀和库存号组成的复合键，它们组合起来构成了完整的 SKU），而不是通过 ID 和 I 链接。不确定这是否会成为性能问题（当然，我总是可以在这些列上手动创建索引以加快速度）。然而，这确实意味着我需要脱离 Rails 约定。

简而言之，我认为就上市时间和易于开发而言，Rails 可能是一个不错的选择，但必须使用现有的数据内容可能会变得很痛苦，因为应用程序需要围绕那个，而不是“传统的”Rails 应用程序，这个因素让我对使用 Rails 产生了很大的怀疑。还有一些其他问题（必须建立一个 Linux 服务器，而且我居住的地区只有很少的 Rails 开发人员，所以如果我离开公司，我基本上会在更新/修改方面将他们作为人质）。我真的不确定继续进行的最佳路径。

原文

I have been tasked with developing a new retail e-commerce storefront for my current job, and I am considering tackling it with RoR to A) Build a "real" project with my limited Rails knowledge, and B) Give management quick turnaround and feedback (they are wanting to get this done ASAP and their deadlines are rather unrealistic - I'm talking a couple of weeks to go from nothing to working model so they can start to market it with SEO/SEM and, I kid you not, "video blogging" because my boss heard that's the future).

We do have a database structure in place but it's absolutely terrible and was thrown together without rhyme nor reason, so I'm going to largely ignore it and create a new database from scratch; however, I have existing data that I need to load into the application (like I said, it's an e-commerce app and we have the product data). I need to massage this data into a usable format because our supplier provides it to us with cryptic, abbreviated column names and it's highly denormalized, especially in the categories (I've posted a question regarding it before - basically the categories table has six fields, one for each category/subcategory, with some of them being blank if that category doesn't apply).

There are two main issues that are giving me second thoughts:

As I said the data needs to be put into a "proper" database schema; I can't just load it as-is. I have some thoughts as to a good data model for it, but my analysis is not completed yet. There would end up being a large amount of joins tables to link various things together (e.g. products_categories, products_attributes, products_prices) etc and these tables would link products not via an ID but by their SKU (see below).
Everything already has an ID that's generated for it, but anything new I add needs to have one autogenerated; I doubt this will be a problem with any mature RDBMS, but I know Rails likes to generate IDs itself. Also, almost all of the product-related tables are linked by SKUs (and in the data provided by the supplier are actually a composite key consisting of the prefix and stock number, which combined make up the full SKU), not by IDs and I'm not sure if this will be a performance issue (of course, I could always manually create indexes on these columns to speed it up). It does mean that I'll need to break away from the Rails conventions, however.

In short, I think that Rails might be a good choice as far as time-to-market and ease-of-development, but having to work with the existing data content might turn into a pain because the application will need to be developed around that, instead of the "traditional" Rails app, and that factor is giving me major doubts about using Rails. There are also some other issues (having to set up a Linux server, and the fact that the area I live in has very few Rails developers so if I left the company I'd basically be holding them hostage as far as updates/modifications). I'm really unsure as to the best path to proceed.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぃ双果 2024-07-19 13:12:09

我会开发该应用程序，就好像您没有数据一样。使用 ORM 并使您的数据库达到最佳状态，但当然请记住您必须使用哪些数据来填充它（例如：不要对那些会让您逐条记录旧数据的事物制定疯狂的新约束）。

完成并测试后，编写一个导入脚本，将真实数据提取到新数据库中。

这与传统的设计/开发模型没有什么不同......除了您可以以半自动的方式进行数据输入之外。

回复收藏 0 原文

护你周全 2024-07-19 13:12:09

不久前我也遇到过同样的情况 - 一个蹩脚的 PHP 应用程序保存了十年的所有公司数据。

我所做的只是创建一个迁移模型并添加方法来导入每个资源。

class Migration
  def migration_all
    self.jobs
  end

  def self.jobs
    ...
  end
end

这样做的一个很酷的事情是，您可以安排导入哪些订单资源，因为一个资源可能会引用另一个资源。我还添加了直接修改数据库模式的方法。如果您必须保留现有主键，一个不错的技巧是创建一个名为“legacy_id”的字段，复制现有主键，完成后，只需删除“id”字段，重命名“legacy_id”字段到'id'，然后在新的'id'字段上添加primary_key约束。

I was in the same situation not too long ago — a crappy PHP app that held ten years worth of all company data.

What I did was simply create a Migration model and added methods to import each resource.

class Migration
  def migration_all
    self.jobs
  end

  def self.jobs
    ...
  end
end

The cool thing about this is that you can arrange which order resources are imported as one will likely reference another. I also added methods that directly modified the db schema. One nice trick if you have to keep an existing primary key is to create a field named 'legacy_id', copy over your existing primary key, and when you're done, simply remove the 'id' field, rename the 'legacy_id' field to 'id', then add the primary_key constraint on the new 'id' field.

回复收藏 0 原文