解析 RDFa、微数据等、使用统一模式/词汇（例如 schema.org）存储和显示信息的最佳方法是什么

发布于 2024-12-01 15:37:12 字数 1768 浏览 6 评论 0 原文

我主要使用 Ruby 来执行此操作，但到目前为止我的攻击计划如下：

使用 gems rdf、rdf-rdfa 以及 rdf-microdata 或 mida 来解析给定任何 URI 的数据。我认为最好映射到像 schema.org 这样的统一模式，例如，采用这个 yaml 文件，它试图描述数据词汇和 opengraph 之间的转换到 schema.org：

# Schema X to schema.org conversion
#data-vocabulary
DV:
  name:name
  street-address:streetAddress
  region:addressRegion
  locality:addressLocality
  photo:image
  country-name:addressCountry
  postal-code:postalCode
  tel:telephone
  latitude:latitude
      longitude:longitude
  type:type
#opengraph
OG:
  title:name
  type:type
  image:image
  site_name:site_name
  description:description
  latitude:latitude
  longitude:longitude
  street-address:streetAddress
  locality:addressLocality
  region:addressRegion
  postal-code:postalCode
  country-name:addressCountry
  phone_number:telephone
  email:email

然后我可以存储以一种格式找到的信息并使用 schema.org 语法重新显示它们。

另一部分是确定类型。我会按照 schema.org 建模我的表，并且我想知道“事物”的类型（事物）一个记录将是。因此，如果我解析“bar”的 opengraph 类型，我会将其存储为“BarOrPub”（BarOrPub）。

有更好的方法吗？自动化的东西吗？已经有解决方案了吗？任何意见表示赞赏。

编辑：

所以我发现这解析得很好（其中 all_tags 包含我感兴趣的标签作为键，而 schema.org 相当于值）：

RDF::RDFa::Reader.open(url) do |reader|
        reader.each_statement do |statement|
          tag = statement.predicate.to_s.split('/')[-1].split('#')[-1]
          Rails.logger.debug "rdf tag: #{tag}"
          Rails.logger.debug "rdf predicate: #{statement.predicate}"
          if all_tags.keys.include? tag
            Rails.logger.debug "Found mapping for #{statement.predicate} and #{all_tags[tag]}"
            results[all_tags[tag]] = statement.object.to_s.strip
          end
        end
      end

原文

I'm mainly using Ruby to do this but my plan of attack thus far is as follows:

Use the gems rdf,rdf-rdfa, and either rdf-microdata or mida to parse data given any URI. I think it'd be best to map to a uniform schema like schema.org, for example take this yaml file which attempts to describe the conversion between data-vocabulary and opengraph to schema.org:

# Schema X to schema.org conversion
#data-vocabulary
DV:
  name:name
  street-address:streetAddress
  region:addressRegion
  locality:addressLocality
  photo:image
  country-name:addressCountry
  postal-code:postalCode
  tel:telephone
  latitude:latitude
      longitude:longitude
  type:type
#opengraph
OG:
  title:name
  type:type
  image:image
  site_name:site_name
  description:description
  latitude:latitude
  longitude:longitude
  street-address:streetAddress
  locality:addressLocality
  region:addressRegion
  postal-code:postalCode
  country-name:addressCountry
  phone_number:telephone
  email:email

I can then store information found in one format and re-display them with schema.org syntax.

The other part is determining type. I'd model my tables after schema.org and I'd like to know the type of 'Thing' (Thing) a record would be. So if I parse an opengraph type of 'bar', I'd store it is 'BarOrPub' (BarOrPub).

Is there a better way of doing this? Something automated? A solution already out there? Any input appreciated.

EDIT:

So I'm finding that this parses pretty well (where all_tags includes the tags i'm interested in as keys and schema.org equivalent as the value):

RDF::RDFa::Reader.open(url) do |reader|
        reader.each_statement do |statement|
          tag = statement.predicate.to_s.split('/')[-1].split('#')[-1]
          Rails.logger.debug "rdf tag: #{tag}"
          Rails.logger.debug "rdf predicate: #{statement.predicate}"
          if all_tags.keys.include? tag
            Rails.logger.debug "Found mapping for #{statement.predicate} and #{all_tags[tag]}"
            results[all_tags[tag]] = statement.object.to_s.strip
          end
        end
      end

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过去的过去 2024-12-08 15:37:12

对于最初的问题，您走在正确的轨道上。事实上，我们在 structured-data.org linter 中做了类似的事情。查看 GitHub 存储库可能对您有用。基本思想是格式化检测并选择合适的阅读器（RDFa、Microdata 或其他）。阅读完后，您将得到一个图表。您需要运行图中的每个语句，并创建一个新的输出图，其中包含基于表映射的谓词和类型。因此，例如，如果您在源图中将 dv:name 作为谓词，则可以在输出图中输出 schema:name。

确定类型还需要映射表来提供适当的输出类型。请注意，OGP 实际上并不使用 rdf:type，因此您需要找到带有 ogp:type 的语句并输出 rdf:type 以及映射的类。

处理整个问题的另一种方法是使用 owl:equivalentProperty/equivalentClass 断言创建一个词汇表，并执行 OWL 蕴涵以将适当的三元组添加到原始图中。 Ruby 的工具集目前还不能完全满足这一点。