@activediscourse/podcast-parser 中文文档教程

发布于 3年前 浏览 8 项目主页 更新于 3年前

podcast-parser

将 XML 播客 RSS 提要解析为标准化对象。

installation

yarn add @activediscourse/podcast-parser

usage

传递一个包含 XML 源的字符串:

const parsePodcast = require("@activediscourse/podcast-parser")

parsePodcast("<podcast xml>")
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

这个库只处理解析,所以你需要获取提要 先分别。 例如,使用 node-fetch(或 在浏览器中提取

const fetch = require("node-fetch")
const parsePodcast = require("@activediscourse/podcast-parser")

;(async () => {
  const response = await fetch("https://pinecast.com/feed/activediscourse")
  const xml = await response.text()
  const feed = await parsePodcast(xml)

  return feed
})()
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

output format

输出是自以为是的,目的是规范化结果跨提要:

{
  "title": "<Podcast title>",
  "description": {
    "short": "<Podcast subtitle>",
    "long": "<Podcast description>"
  },
  "link": "<Podcast link (usually website for podcast)>",
  "image": "<Podcast image>",
  "language": "<ISO 639 language>",
  "copyright": "<Podcast copyright>",
  "updated": "<pubDate or latest episode pubDate>",
  "explicit": "<Podcast is explicit, true/false>",
  "categories": [
    "Category>Subcategory"
  ],
  "author": "<Author name>",
  "owner": {
    "name":  "<Owner name>",
    "email": "<Owner email>"
  },
  "episodes": [
    {
      "guid": "<Unique id>",
      "title": "<Episode title>",
      "subtitle": "<Episode subtitle>",
      "description": "<Episode description>",
      "rawDescription": "<Episode description stripped of HTML tags>",
      "explicit": "<Episode is is explicit, true/false>",
      "image": "<Episode image>",
      "published": "<date>",
      "duration": 120,
      "categories": [
        "Category"
      ],
      "enclosure": {
        "filesize": 5650889,
        "type": "audio/mpeg",
        "url": "<mp3 file>"
      }
    }
  ]
}

notes

language

许多播客的语言设置类似于 en。 尽力而为的尝试 用于将语言字符串规范化为 IETF 语言代码,因此对于 en 将被转换为 en-us。 将呈现非英语语言 例如 de-DE

normalization

并非所有提要都可以保证包含所有属性,因此它们只是 在这种情况下从输出中省略。

如果播客不是,则剧集类别将作为一个空数组包含在内 分配任何类别。

剧集按发布日期降序排列。

development

  1. Clone the repo: git clone https://github.com/activediscourse/podcast-parser.git
  2. Move into the new directory: cd podcast-parser
  3. Install dependencies: yarn
  4. Build the source: yarn build
  5. Run tests: yarn test

license

MIT © Bo Lingen / citycide

基于 node-podcast-parser,也是 MIT, © 安蒂库皮拉。

请参阅许可证

podcast-parser

Parse XML podcast RSS feeds into standardized objects.

installation

yarn add @activediscourse/podcast-parser

usage

Pass a string containing XML source:

const parsePodcast = require("@activediscourse/podcast-parser")

parsePodcast("<podcast xml>")
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

This library only handles parsing, so you'll need to fetch the feed separately first. For example, using node-fetch (or fetch in the browser):

const fetch = require("node-fetch")
const parsePodcast = require("@activediscourse/podcast-parser")

;(async () => {
  const response = await fetch("https://pinecast.com/feed/activediscourse")
  const xml = await response.text()
  const feed = await parsePodcast(xml)

  return feed
})()
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

output format

The output is opinionated with the goal of normalizing results across feeds:

{
  "title": "<Podcast title>",
  "description": {
    "short": "<Podcast subtitle>",
    "long": "<Podcast description>"
  },
  "link": "<Podcast link (usually website for podcast)>",
  "image": "<Podcast image>",
  "language": "<ISO 639 language>",
  "copyright": "<Podcast copyright>",
  "updated": "<pubDate or latest episode pubDate>",
  "explicit": "<Podcast is explicit, true/false>",
  "categories": [
    "Category>Subcategory"
  ],
  "author": "<Author name>",
  "owner": {
    "name":  "<Owner name>",
    "email": "<Owner email>"
  },
  "episodes": [
    {
      "guid": "<Unique id>",
      "title": "<Episode title>",
      "subtitle": "<Episode subtitle>",
      "description": "<Episode description>",
      "rawDescription": "<Episode description stripped of HTML tags>",
      "explicit": "<Episode is is explicit, true/false>",
      "image": "<Episode image>",
      "published": "<date>",
      "duration": 120,
      "categories": [
        "Category"
      ],
      "enclosure": {
        "filesize": 5650889,
        "type": "audio/mpeg",
        "url": "<mp3 file>"
      }
    }
  ]
}

notes

language

Many podcasts have the language set something like en. A best effort attempt is made to normalize language strings to an IETF language code, so for example en will be converted to en-us. Non-English languages will be presented for example as de-DE.

normalization

Not all feeds can be guaranteed to contain all properties, so they are simply ommited from the output in that case.

Episode categories are included as an empty array if the podcast isn't assigned any categories.

Episodes are sorted in descending order by publish date.

development

  1. Clone the repo: git clone https://github.com/activediscourse/podcast-parser.git
  2. Move into the new directory: cd podcast-parser
  3. Install dependencies: yarn
  4. Build the source: yarn build
  5. Run tests: yarn test

license

MIT © Bo Lingen / citycide

Based on node-podcast-parser, also MIT, © Antti Kupila.

See license

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文