截断 Markdown?

发布于 07-11 15:38 字数 1022 浏览 8 评论 0原文

我有一个 Rails 站点,其中的内容是用 markdown 编写的。 我希望显示每个内容的片段,并带有“阅读更多..”链接。

我该怎么办? 例如,简单地截断原始文本是行不通的。

>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"

理想情况下,我希望允许作者(可选)插入一个标记来指定用作“片段”的内容,如果没有,则需要 250 个单词,并附加“。 .." - 例如..

This article is an example of something or other.

This segment will be used as the snippet on the index page.

^^^^^^^^^^^^^^^

This text will be visible once clicking the "Read more.." link

标记可以被认为是 EOF 标记(在显示完整文档时可以忽略)

我正在使用 maruku 用于 Markdown 处理(RedCloth 非常偏向 Textile,BlueCloth 非常有缺陷,我想要一个本机 Ruby 解析器,它排除了 peg-markdown 和 RDiscount)

或者(因为 Markdown 已翻译)无论如何,正确截断 HTML 是一种选择 - 尽管最好不要 markdown() 整个文档,而只是为了获取前几行。

所以,我能想到的选项是(按优先顺序)..

  • 向 maruku 解析器添加一个“截断”选项,它只会解析前 x 个单词,或者直到“摘录”标记。
  • 编写/查找与解析器无关的 Markdown truncate'r
  • 编写/查找智能 HTML 截断函数

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.

How do I go about this? Simple truncating the raw text will not work, for example..

>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"

Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..

This article is an example of something or other.

This segment will be used as the snippet on the index page.

^^^^^^^^^^^^^^^

This text will be visible once clicking the "Read more.." link

The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)

I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)

Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.

So, the options I can think of are (in order of preference)..

  • Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
  • Write/find a parser-agnostic Markdown truncate'r
  • Write/find an intelligent HTML truncating function

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

断爱2024-07-18 15:38:11
  • 编写/找到一个智能 HTML 截断函数

以下内容来自 http:// /mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/,经过一些修改将正确截断 HTML,并轻松允许在结束标记之前附加字符串。

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")
=> <p><b><a href="hi">Someth...</a></b></p>

修改后的代码:

require 'rexml/parsers/pullparser'

class String
  def truncate_html(len = 30, at_end = nil)
    p = REXML::Parsers::PullParser.new(self)
    tags = []
    new_len = len
    results = ''
    while p.has_next? && new_len > 0
      p_e = p.pull
      case p_e.event_type
      when :start_element
        tags.push p_e[0]
        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
      when :end_element
        results << "</#{tags.pop}>"
      when :text
        results << p_e[0][0..new_len]
        new_len -= p_e[0].length
      else
        results << "<!-- #{p_e.inspect} -->"
      end
    end
    if at_end
      results << "..."
    end
    tags.reverse.each do |tag|
      results << "</#{tag}>"
    end
    results
  end

  private

  def attrs_to_s(attrs)
    if attrs.empty?
      ''
    else
      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
    end
  end
end
  • Write/find an intelligent HTML truncating function

The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")
=> <p><b><a href="hi">Someth...</a></b></p>

The modified code:

require 'rexml/parsers/pullparser'

class String
  def truncate_html(len = 30, at_end = nil)
    p = REXML::Parsers::PullParser.new(self)
    tags = []
    new_len = len
    results = ''
    while p.has_next? && new_len > 0
      p_e = p.pull
      case p_e.event_type
      when :start_element
        tags.push p_e[0]
        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
      when :end_element
        results << "</#{tags.pop}>"
      when :text
        results << p_e[0][0..new_len]
        new_len -= p_e[0].length
      else
        results << "<!-- #{p_e.inspect} -->"
      end
    end
    if at_end
      results << "..."
    end
    tags.reverse.each do |tag|
      results << "</#{tag}>"
    end
    results
  end

  private

  def attrs_to_s(attrs)
    if attrs.empty?
      ''
    else
      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
    end
  end
end
三寸金莲2024-07-18 15:38:11

这是一个适合我的 Textile 解决方案。

  1. 将其转换为 HTML
  2. 截断它。
  3. 删除所有被切成两半的 HTML 标签

    html_string.gsub(/<[^>]*$/, "") 
      
  4. 然后,使用 Hpricot 来清理它并关闭未封闭的标签

    html_string = Hpricot( html_string ).to_s  
      

我在助手中执行此操作,并且通过缓存,不存在性能问题。

Here's a solution that works for me with Textile.

  1. Convert it to HTML
  2. Truncate it.
  3. Remove any HTML tags that got cut in half with

    html_string.gsub(/<[^>]*$/, "")
    
  4. Then, uses Hpricot to clean it up and close unclosed tags

    html_string = Hpricot( html_string ).to_s 
    

I do this in a helper, and with caching there's no performance issue.

朮生2024-07-18 15:38:11

您可以使用正则表达式来查找仅包含“^”字符的行:

markdown_string = <<-eos
This article is an example of something or other.

This segment will be used as the snippet on the index page.

^^^^^^^^^^^^^^^

This text will be visible once clicking the "Read more.." link
eos

preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview

You could use a regular expression to find a line consisting of nothing but "^" characters:

markdown_string = <<-eos
This article is an example of something or other.

This segment will be used as the snippet on the index page.

^^^^^^^^^^^^^^^

This text will be visible once clicking the "Read more.." link
eos

preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
情魔剑神2024-07-18 15:38:11

与其尝试截断文本,不如使用 2 个输入框,一个用于“开头简介”,一个用于主要“内容”。 这样,您的作者就可以准确地知道正在显示的内容,而无需依赖某种时髦的 EOF 标记。

Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.

尐籹人2024-07-18 15:38:11

我必须同意“两个输入”方法,内容编写者不必担心,因为您可以修改后台逻辑以在显示完整内容时将两个输入混合在一起。

full_content = input1 + input2 // perhaps with some complementary html, for a better formatting

I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.

full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
深空失忆2024-07-18 15:38:11

不确定它是否适用于这种情况,但为了完整性添加下面的解决方案。 如果要截断 Markdown 渲染的内容,可以使用 strip_tags 方法:

truncate(strip_tags(markdown(article.contents)), length: 50)

来源:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/

Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:

truncate(strip_tags(markdown(article.contents)), length: 50)

Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/

谁对谁错谁最难过2024-07-18 15:38:11

一个更简单、有效的选项:

truncate(markdown(item.description), length: 100, escape: false)

A simpler option that just works:

truncate(markdown(item.description), length: 100, escape: false)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文