解析 RDFa、微数据等、使用统一模式/词汇(例如 schema.org)存储和显示信息的最佳方法是什么

发布于 2024-12-01 15:37:12 字数 1768 浏览 0 评论 0 原文

我主要使用 Ruby 来执行此操作,但到目前为止我的攻击计划如下:

使用 gems rdf、rdf-rdfa 以及 rdf-microdata 或 mida 来解析给定任何 URI 的数据。我认为最好映射到像 schema.org 这样的统一模式,例如,采用这个 yaml 文件,它试图描述数据词汇和 opengraph 之间的转换到 schema.org:

# Schema X to schema.org conversion
#data-vocabulary
DV:
  name:name
  street-address:streetAddress
  region:addressRegion
  locality:addressLocality
  photo:image
  country-name:addressCountry
  postal-code:postalCode
  tel:telephone
  latitude:latitude
      longitude:longitude
  type:type
#opengraph
OG:
  title:name
  type:type
  image:image
  site_name:site_name
  description:description
  latitude:latitude
  longitude:longitude
  street-address:streetAddress
  locality:addressLocality
  region:addressRegion
  postal-code:postalCode
  country-name:addressCountry
  phone_number:telephone
  email:email

然后我可以存储以一种格式找到的信息并使用 schema.org 语法重新显示它们。

另一部分是确定类型。我会按照 schema.org 建模我的表,并且我想知道“事物”的类型(事物 )一个记录将是。因此,如果我解析“bar”的 opengraph 类型,我会将其存储为“BarOrPub”(BarOrPub)。

有更好的方法吗?自动化的东西吗?已经有解决方案了吗?任何意见表示赞赏。

编辑:

所以我发现这解析得很好(其中 all_tags 包含我感兴趣的标签作为键,而 schema.org 相当于值):

RDF::RDFa::Reader.open(url) do |reader|
        reader.each_statement do |statement|
          tag = statement.predicate.to_s.split('/')[-1].split('#')[-1]
          Rails.logger.debug "rdf tag: #{tag}"
          Rails.logger.debug "rdf predicate: #{statement.predicate}"
          if all_tags.keys.include? tag
            Rails.logger.debug "Found mapping for #{statement.predicate} and #{all_tags[tag]}"
            results[all_tags[tag]] = statement.object.to_s.strip
          end
        end
      end

I'm mainly using Ruby to do this but my plan of attack thus far is as follows:

Use the gems rdf,rdf-rdfa, and either rdf-microdata or mida to parse data given any URI. I think it'd be best to map to a uniform schema like schema.org, for example take this yaml file which attempts to describe the conversion between data-vocabulary and opengraph to schema.org:

# Schema X to schema.org conversion
#data-vocabulary
DV:
  name:name
  street-address:streetAddress
  region:addressRegion
  locality:addressLocality
  photo:image
  country-name:addressCountry
  postal-code:postalCode
  tel:telephone
  latitude:latitude
      longitude:longitude
  type:type
#opengraph
OG:
  title:name
  type:type
  image:image
  site_name:site_name
  description:description
  latitude:latitude
  longitude:longitude
  street-address:streetAddress
  locality:addressLocality
  region:addressRegion
  postal-code:postalCode
  country-name:addressCountry
  phone_number:telephone
  email:email

I can then store information found in one format and re-display them with schema.org syntax.

The other part is determining type. I'd model my tables after schema.org and I'd like to know the type of 'Thing' (Thing) a record would be. So if I parse an opengraph type of 'bar', I'd store it is 'BarOrPub' (BarOrPub).

Is there a better way of doing this? Something automated? A solution already out there? Any input appreciated.

EDIT:

So I'm finding that this parses pretty well (where all_tags includes the tags i'm interested in as keys and schema.org equivalent as the value):

RDF::RDFa::Reader.open(url) do |reader|
        reader.each_statement do |statement|
          tag = statement.predicate.to_s.split('/')[-1].split('#')[-1]
          Rails.logger.debug "rdf tag: #{tag}"
          Rails.logger.debug "rdf predicate: #{statement.predicate}"
          if all_tags.keys.include? tag
            Rails.logger.debug "Found mapping for #{statement.predicate} and #{all_tags[tag]}"
            results[all_tags[tag]] = statement.object.to_s.strip
          end
        end
      end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

过去的过去 2024-12-08 15:37:12

对于最初的问题,您走在正确的轨道上。事实上,我们在 structured-data.org linter 中做了类似的事情。查看 GitHub 存储库可能对您有用。基本思想是格式化检测并选择合适的阅读器(RDFa、Microdata 或其他)。阅读完后,您将得到一个图表。您需要运行图中的每个语句,并创建一个新的输出图,其中包含基于表映射的谓词和类型。因此,例如,如果您在源图中将 dv:name 作为谓词,则可以在输出图中输出 schema:name。

确定类型还需要映射表来提供适当的输出类型。请注意,OGP 实际上并不使用 rdf:type,因此您需要找到带有 ogp:type 的语句并输出 rdf:type 以及映射的类。

处理整个问题的另一种方法是使用 owl:equivalentProperty/equivalentClass 断言创建一个词汇表,并执行 OWL 蕴涵以将适当的三元组添加到原始图中。 Ruby 的工具集目前还不能完全满足这一点。

For the original question, you're on the right track. In fact, we do similar things in the structured-data.org linter. It might be useful for you to check out the GitHub repo. The basic idea is to to format detection and choose the appropriate reader (RDFa, Microdata or whatever). Once read, you'll have a graph. You'll want to run through each statement in the graph and create a new output graph with predicates and types mapped based on your table. So, for instance, if you say dv:name as a predicate in the source graph, you could output schema:name in the output graph.

Determining type will also require a mapping table to come up with the appropriate output type. Note that OGP doesn't actually use rdf:type, so you'll need to find a statement with ogp:type and output an rdf:type along with the mapped class.

Another way to approach the whole thing would be to create an vocabulary with owl:equivalentProperty/equivalentClass assertions and perform OWL entailment to add appropriate triples to the original graph. Ruby's toolset isn't (yet) quite up to this at this point.

她比我温柔 2024-12-08 15:37:12

关于 Schema.org 映射,我们正在 http://www.w3.org/wiki/WebSchemas 收集相关链接。如果您有新的作品,请添加。

另请参阅:

在在某些时候,您无疑会遇到超越简单的“这与那相同”或“这意味着”三重模式的映射。您应该能够进一步使用 SPARQL 查询,特别是如果您有支持 v1.1 的 SPARQL 引擎。最终,映射任务有时需要自定义代码。

Regarding Schema.org mappings, we are collecting relevant links at http://www.w3.org/wiki/WebSchemas. If you produce any new ones, please add them.

See also:

At some point you'll doubtless run into mappings that go beyond simple "this is the same as that" or "this implies that" triple patterns. You should be able to go some way further using SPARQL queries, particularly if you have a SPARQL engine supporting v1.1. And eventually, mapping tasks sometimes require custom code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文