比较来自不同网络的数据源(联盟营销)
我正在致力于将联盟销售整合到几个现有网站中。我们正在使用一些通过不同网络(cj、shareasale、linkshare、avantlink)工作的商家。
现在我的观察是,所有这些网络都提供不同格式的数据源。但这不是什么大问题。我主要担心的是商家在同一产品上使用不同的标题。我不想遇到这些情况:
a)来自 N 个商家的相同产品的两个列表(如果标题略有不同)
b)来自商家的 N 个不同产品的一个列表(如果我们不使用严格比较)算法)
我们希望尽可能自动化一切,希望避免操作员一直扫描有问题的列表。
这个问题通常是如何处理的?
I am working on integrating affiliate sales into few existing sites. We are using a few merchants who work via different networks (cj, shareasale, linkshare, avantlink).
Now my observation is that all these networks provide data feeds in different formats. But that's not a big problem. My main concern is actually merchants using different titles on same products. I don't want to run into these situations:
a) two listings of the SAME product from N merchants (if titles are just a bit different)
b) one listing of N different products from merchants (if we don't use strict comparison algorithm)
We want to automate everything as much as possible, want to avoid operators scanning listings under question all the time.
How is this problem typically handled?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我们在尝试折叠来自多个商家源的产品时也遇到类似的问题。我们所做的是根据品牌(或制造商)+ sku 组合折叠产品。
我们的数据相当混乱,因此我们必须做一些工作来规范品牌和 SKU,以便产品能够很好地折叠。我们有一份我们关心的品牌列表,并做了一些工作将品牌从商家信息映射到我们的品牌。例如,如果我们的系统中有“ACME”品牌,我们可能会将以下内容映射到该品牌:
对于 sku,我们通常只是删除任何非字母数字字符以进行匹配。例如,以下所有内容都将映射到相同的 sku:
因此,如果我们看到品牌“ACME Inc.”一个 Feed 中的 sku“abc-123”将与另一个 Feed 中的品牌“ACME”和 sku“abc-123”合并。
作为折叠过程的一部分,我们最终会为每个折叠部分提供多个名称/图像/描述/类别/等等,并且需要选择“最佳”一个来在网站上显示。
这是我们如何处理它的一个非常高层次的概述。
We have a similar issue with trying to collapse products from multiple merchant feeds. What we do is collapse products based on their brand (or manufacturer) + sku combo.
Our data is pretty messy so we have to do some work to normalize both the brand and the sku so the products collapse nicely. We have a list of brands that we care about and do some work to map brands from the merchant feed into our brand. e.g. If we have an "ACME" brand in our system we might map the following to that brand:
For skus we usually just strip any non-alphanumeric characters for matching purposes. e.g. all the following would map to the same sku:
So if we see brand "ACME Inc." and sku "abc-123" in one feed that will collapse with brand "A.C.M.E" and sku "abc 123" from another feed.
As part of the collapsing process we end up with multiple names/images/descriptions/categories/etc... for each collapsed part and need to choose the "best" one to show on the website.
That's a very high level overview of how we handle it.
寻找在其 Feed 中提供 UPC 代码的商家。它们是通用的。另外,在 AvantLink 中,您可以自定义自己的提要输出,这很好。
Look for merchants who provide UPC codes in their feeds. They are universal. Plus in AvantLink you can customize your own feed output so that's nice.
实际上,一分钟前我正在查看来自 AvantLink 的 2 个示例数据源。以下是他们提供的字段列表(未过滤,所以我认为它就是一切):
我认为是的,拥有 UPC 将(几乎)理想,但我正在查看的两家商店(其中之一)是 REI)不提供 UPC。
检查了Commission Junction和Sshareasale这几个大商户,他们也不包含UPC。
I was actually looking at 2 sample data feeds from AvantLink a minute ago. Here's the list of fields they provide (not filtered, so I assume it's everything):
I was thinking that yes, having UPC would be (almost) ideal but both stores I was looking at (one of them is REI) don't provide UPC's.
Checked Commission Junction and Sshareasale, a few large merchants, they don't include UPC's either.
此类场景通常由 ORACLE、HP、Microsoft、IBM、Netezza 或 Teradata 提供的数据仓库系统涵盖。
Such scenarios are typically covered by data warehouse systems like provided by ORACLE, HP, Microsoft, IBM, Netezza or Teradata.