使用 Amazon BrowseNodes API 做一些有意义的事情

发布于 2024-10-11 05:47:09 字数 4969 浏览 10 评论 0原文

我有一个网站（www.7bks.com），人们可以在其中创建图书列表。目前这相当简单。我已经在使用 Amazon API 将图书信息、图像等提取到网站上。

我想要做的是以某种方式使用 Amazon API 来拉回类别和/或标签数据，以创建某种在我的网站上浏览列表的方式。不幸的是，标签 api 方法已停止使用。

最有可能的候选者是 Amazon API 的 BrowseNodes 方法 (http ://docs.amazonwebservices.com/AWSEcommerceService/2005-10-05/ApiReference/BrowseNodesResponseGroup.html），但此调用返回的数据非常无意义，我希望我们能够齐心协力并弄清楚如何理解它。

这是一个谷歌电子表格，向您展示我得到的数据类型。我选择了一个示例列表（http://www.7bks.com/list/549002）并通过 BrowseNodes API 运行这三本书：

https:// Spreads.google.com/ccc?key=0ApVjkgehRamudHd5SlNhYllPQkZDSDY1cllfQVBQM1E&hl=en&authkey=CN_MxoAO

作为一个人查看该列表，您不需要知道这些书是什么，就可以看到它可能是列表是关于科幻和奇幻的。这主要是因为眼睛善于丢弃无意义的类别，例如“定制商店”和“小说完整”。

我尝试对类别列表进行重复数据删除，或者只查看所有 3 本书中出现的类别，但它仍然是相当垃圾的数据。我很想听听您对如何将这些数据转化为对用户有意义的东西的想法。

到目前为止，我最好的想法就是扫描数据并匹配到硬编码列表。所以类似：

if Count("sciencefiction & Fantasy") > 3 然后列表是科幻 if Count("商业金融与法律") > 3 然后列出的是业务

等。

虽然这是非常严格的，但理想情况下我想构建一些更灵活/更强大的东西。

欢迎所有建议。

我认为这是一个高级问题，因此不应受到我如何调用 API 的影响，但作为参考，我正在使用 Python/Appengine/Webapp。

感谢

汤姆

更新，经过多次头撞桌子后，我已经成功地解决了这个问题，令我满意。这并不复杂，但我已经编写了一些可以实现我想要的功能的 python 代码。我欢迎任何人改进我的代码或提供建议。

基本上代码底层的逻辑是这样的： 1) 在 XML 树中，起始节点（书籍 > 主题）的底部节点是对本书实际内容的最佳猜测。例如： http://www.amazon .co.uk/Surface-Detail-Iain-M-Banks/dp/1841498939/ 它返回“科幻小说”。宾果游戏。 2）通常，如果将我们自己限制在那些开始的结果（书籍>主题）中，就会丢掉很多好的信息。所以， 3）我尝试获取类似书籍的列表并从中删除类别，如果失败，那么我只获取分配给原始书籍的类别。

也许最好的解释是给你提供如下代码：

#takes as input the xml output of the amazon api browsenodes call
def getcategories(xml):
    #fetches the names of all the nodes, stores them in a list
    categories = []              
    for book in xml.getElementsByTagName('BrowseNode'):
        category = get_text(book,'Name')
        categories.append(category)

    #turn the one list into a series of individual lists
    #each individual list should be a particular tree from browsenode
    #each list will end 'Books'
    #the first item in the list should be the bottom of the tree
    taglists = []
    while 'Books' in categories:
        find = categories.index('Books') + 1
        list = categories[:find]
        taglists.append(list)
        for word in list:
            categories.remove(word)

    #now, we only return the first item from a list which contains 'Subjects'        
    final = []    
    for tagset in taglists:
        while 'Subjects' in tagset:
            final.append(tagset[0])
            tagset.pop(tagset.index('Subjects'))
    return final

class Browsenodes(webapp.RequestHandler):
    def get(self):
        #get the asin of the target book
        asin = self.request.get('term')
        if book_title:
            #fetch the amazon key
            api = API(AWS_KEY, SECRET_KEY, 'uk', processor=minidom_response_parser)
            try:
                #try getting a list of similar books - note the response group set to browsenodes
                result = api.similarity_lookup(asin, ResponseGroup='BrowseNodes')
            except:
                #there aren't always a list of similar books, so as a failsafe just get the book I wanted.
                result = api.item_lookup(asin, ResponseGroup='BrowseNodes')
            final = getcategories(result)
            #turn it into a set to de-dupe multiple listings of the same category
            self.response.out.write(set(final))

为了让你了解输出的味道：

Book： http://www.amazon.co.uk /Surface-Detail-Iain-M-Banks/dp/1841498939/

标签：当代小说产品太空歌剧科幻小说

http://www.amazon.co .uk/戈德尔-埃舍尔-巴赫-永恒-周年纪念/dp/0140289208/ 心理学数学史数理逻辑通用原子吸收光谱法大众数学科学、技术与医疗的艺术与艺术音乐心灵哲学亚马逊数学建筑与逻辑当代哲学：1900- 逻辑经典物理形而上学物理哲学一般的技术代数数论人工智能科学史

http://www.amazon。 co.uk/Flatland-Romance-Dimensions-Dover-Thrift/dp/048627263X/ 当代小说数学哲学通用原子吸收光谱法大众数学哲学科学、技术与医疗的心灵哲学科幻小说数学当代哲学：1900- 代数数论产品经典形而上学与有远见神话与神话童话故事拓扑概要主题一般的理论方法形而上学人工智能科学史

http://www.amazon。 co.uk/Victoria-Condor-Books-Knut-Hamsun/dp/0285647598/ 当代小说文学小说心理通用原子吸收光谱法经典短篇小说

原文

I have a website (www.7bks.com) where people create book lists. It's fairly simple at the moment. I'm already using the Amazon API to pull book information, images etc onto the site.

What I'd like to do is somehow use the Amazon API to pull back category and/or tag data to create some way of browsing lists on my site. Unfortunately, the tag api method is discontinued.

The most likely candidate is the BrowseNodes method of the Amazon API (http://docs.amazonwebservices.com/AWSEcommerceService/2005-10-05/ApiReference/BrowseNodesResponseGroup.html) but the data returned from this call is pretty nonsensical and I was hoping we might be able to put our heads together and figure out how to make sense of it.

Here's a google spreadsheet to show you the kind of data I get. I picked a sample list (http://www.7bks.com/list/549002) and ran the three books through the BrowseNodes API:

https://spreadsheets.google.com/ccc?key=0ApVjkgehRamudHd5SlNhYllPQkZDSDY1cllfQVBQM1E&hl=en&authkey=CN_MxoAO

Looking at the list as a human you don't need to know what the books are in order to see that it's likely the list is about Sci-Fi and Fantasy. That's mainly though because the eye is good at discarding meaningless categories such as "custom stores" and "fiction complete".

I tried de-duping the list of categories, or only looking at the categories that appear for all 3 books but it's still fairly crap data. I would love your thoughts on how I can turn this data into something meaningful for the users.

My best thought so far is just to scan the data and match to a hard-coded list. So something like:

if Count("science fiction & fantasy") > 3 then list is sci fi
if Count("business finance & law") > 3 then list is business

etc.

This is very rigid though and ideally I'd like to build something a little more flexible/powerful.

All suggestions welcome.

I think this is a high-level question so shouldn't be impacted by HOW I'm calling the API but for reference I'm using Python/Appengine/Webapp.

Thanks

Tom

UPDATE after much banging of head against desk I've managed to fix this this issue to my satisfaction. It's not that complicated but I've hacked together some python code that does what I want. I welcome anyone improving on my code or offering suggestions.

Basically the logic underlying the code is this:
1) In the XML tree, the bottom node of a node that starts (books > subjects) is the best guess at what the book is actually about. E.g. for this: http://www.amazon.co.uk/Surface-Detail-Iain-M-Banks/dp/1841498939/ it returns "science fiction". Bingo.
2) Typically there's a lot of good information thrown away by limiting ourselves to just those results that start (books > subject). Therefore,
3) I try getting a list of similar books and pulling the categories off them, if that fails then I just get the category assigned to the original book.

Perhaps best explained by giving you the code as follows:

#takes as input the xml output of the amazon api browsenodes call
def getcategories(xml):
    #fetches the names of all the nodes, stores them in a list
    categories = []              
    for book in xml.getElementsByTagName('BrowseNode'):
        category = get_text(book,'Name')
        categories.append(category)

    #turn the one list into a series of individual lists
    #each individual list should be a particular tree from browsenode
    #each list will end 'Books'
    #the first item in the list should be the bottom of the tree
    taglists = []
    while 'Books' in categories:
        find = categories.index('Books') + 1
        list = categories[:find]
        taglists.append(list)
        for word in list:
            categories.remove(word)

    #now, we only return the first item from a list which contains 'Subjects'        
    final = []    
    for tagset in taglists:
        while 'Subjects' in tagset:
            final.append(tagset[0])
            tagset.pop(tagset.index('Subjects'))
    return final

class Browsenodes(webapp.RequestHandler):
    def get(self):
        #get the asin of the target book
        asin = self.request.get('term')
        if book_title:
            #fetch the amazon key
            api = API(AWS_KEY, SECRET_KEY, 'uk', processor=minidom_response_parser)
            try:
                #try getting a list of similar books - note the response group set to browsenodes
                result = api.similarity_lookup(asin, ResponseGroup='BrowseNodes')
            except:
                #there aren't always a list of similar books, so as a failsafe just get the book I wanted.
                result = api.item_lookup(asin, ResponseGroup='BrowseNodes')
            final = getcategories(result)
            #turn it into a set to de-dupe multiple listings of the same category
            self.response.out.write(set(final))

To give you a flavour of the output:

Book:
http://www.amazon.co.uk/Surface-Detail-Iain-M-Banks/dp/1841498939/

Tags:
Contemporary Fiction
Products
Space Opera
Science Fiction

http://www.amazon.co.uk/Godel-Escher-Bach-Eternal-anniversary/dp/0140289208/
Psychology
History of Mathematics
Mathematical Logic
General AAS
Popular Maths
Scientific, Technical & Medical
Arts & Music
Philosophy of Mind
Amazon
Maths
Architecture & Logic
Contemporary Philosophy: 1900-
Logic
Classics
Physics
Metaphysics
Philosophy of Physics
General
Technology
Algebraic Number Theory
Artificial Intelligence
History of Science

http://www.amazon.co.uk/Flatland-Romance-Dimensions-Dover-Thrift/dp/048627263X/
Contemporary Fiction
Philosophy of Mathematics
General AAS
Popular Maths
Philosophy
Scientific, Technical & Medical
Philosophy of Mind
Science Fiction
Maths
Contemporary Philosophy: 1900-
Algebraic Number Theory
Products
Classics
Metaphysical & Visionary
Myths & Fairy Tales
Topology General
Topics
General
Theoretical Methods
Metaphysics
Artificial Intelligence
History of Science

http://www.amazon.co.uk/Victoria-Condor-Books-Knut-Hamsun/dp/0285647598/
Contemporary Fiction
Literary Fiction
Psychological
General AAS
Classics
Short Stories

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晚雾 2024-10-18 05:47:09

到目前为止，我最好的想法就是扫描数据并匹配到硬编码列表。所以类似：
if Count("科幻与奇幻") > 3 如果 Count("business Finance & law") > 则列表为科幻小说3 然后列表就是业务

我认为这可能不是一个坏主意？从亚马逊获取顶级图书类别，然后与这些类别进行匹配。它不是很优雅，但它会起作用。

或者，也许您可以使用 Google Book API 中的 dc:subject 数据？（虽然我没有使用过它，所以它也可能是垃圾）。

回复收藏 0 原文

漆黑的白昼 2024-10-18 05:47:09

嗯.. 首先，当前的 APi 日期为 2011-08-01。也许您可以通过查看最新的文档来帮自己一个忙？广告产品 API

对我来说，XML 很有意义！

也许是因为，当我想正确理解其中一个答案时，我将 XML 复制到 Visual Studio XML 编辑器中，我可以在其中打开和关闭节点。

结构是这样的：

  <BrowseNodes>
    <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
    </BrowseNodes>

然后在每个 BrowseNode 内部，它将是这样的：

<BrowseNode>
      <BrowseNodeId>10399</BrowseNodeId>
      <Name>Classics</Name>
      <Ancestors>
        <BrowseNode>
          <BrowseNodeId>17</BrowseNodeId>
          <Name>Literature & Fiction</Name>
          <Ancestors>
            <BrowseNode>
              <BrowseNodeId>1000</BrowseNodeId>
              <Name>Subjects</Name>
              <IsCategoryRoot>1</IsCategoryRoot>

注意到“IsCategoryRoot”了吗？没有比这更高的点了，因为这太通用了，使用它没有意义。书籍的名称是“主题”，但电子书的名称是“类别”，因此检查“IsCategoryRoot”元素似乎更有意义。

我不是 100% 确定你想做什么，而且我不太了解 python，但我确实了解数据库...我会得到这本书的 ASIN 标识符（这对于亚马逊来说是全球唯一的，这意味着你可以查找与 amazon.Com 上的相同，还有 co.uk、Fr、de 等...），放入表格中，以及您认为有用的任何其他数据，为类别创建表格，放入其中它们的名称和 id，然后是一个链接表，其中每个较低级别的 BrowseNode 都有一个条目，其中
BrowseNodeID 和书籍的 ASIN，然后对于嵌套的 browsernode（实际上是父母或祖先），输入他们的孩子 ID 和他们自己的 ID。显然，在插入这些类别之前，我会检查它是否已经存在。

这里的目标是每本书有一条记录，每个类别有一条记录，以及类别与书籍之间以及它们之间的必要链接。

这样，从类别中搜索书籍将变得非常容易，反之亦然。

抱歉，我说得有点长，但你的问题没有简短的答案。希望这有帮助。

伯纳德

Hum.. First of all, the curent APi is dated 2011-08-01. maybe you could do yourself a favor by looking at an up to date documentation ? Advertising Products API

To me, the XML makes a lot of sense!

Maybe because , when I want to understand properly one of those answers, I copy the XML into visual studio XML editor, where I can open and close nodes.

The structure is something like this:

  <BrowseNodes>
    <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
    </BrowseNodes>

Then inside of each BrowseNode, it will be something like this:

<BrowseNode>
      <BrowseNodeId>10399</BrowseNodeId>
      <Name>Classics</Name>
      <Ancestors>
        <BrowseNode>
          <BrowseNodeId>17</BrowseNodeId>
          <Name>Literature & Fiction</Name>
          <Ancestors>
            <BrowseNode>
              <BrowseNodeId>1000</BrowseNodeId>
              <Name>Subjects</Name>
              <IsCategoryRoot>1</IsCategoryRoot>

Notice the "IsCategoryRoot"? There is no points going higher than that, as this is just so generic it does not make sense using it. The name is "Subjects" for Books, but it is "Categories" for eBooks, so it does seems to make more sense to check on the "IsCategoryRoot" element.

I am not 100% sure what you want to do, and I don't know python much, but I do know databases... I would get the book ASIN identifier (which is unique worldwide for amazon, meaning you can look for the same asin on amazon.Com, but also, co.uk, Fr, de, and so on...), put in in a table, along with whatever other data you feel usefull, create a tables for categories, put in there their names and id, then one link table with one entry for each lower level BrowseNode, with the
BrowseNodeID and book's ASIN, then for the nested browsenode (wich in facts are the parents, or ancestors), put both their child id and their own. Obviously, before inserting those categories I would check it does not already exists.

The goal here is to have one record per book, one record per category, and as many links between categories to books, and between themselves as needed.

That way, it would be extremely easy to search books from categories, and vice versa.

Sorry if I have been a bit long, but there is no short answer to your question. Hope this helps.

Bernard

回复收藏 0 原文

~没有更多了~