使用 PHP 快速将 (.rtf|.doc) 文件转换为 Markdown 语法

发布于 2024-07-25 15:52:07 字数 128 浏览 10 评论 0原文

我已经手动将文章转换为 Markdown 语法几天了,这变得相当乏味。 其中一些长达 3 或 4 页,全文采用斜体和其他强调文本。 有没有一种更快的方法可以将 (.rtf|.doc) 文件转换为干净的 Markdown 语法,我可以利用?

I've been manually converting articles into Markdown syntax for a few days now, and it's getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized text throughout. Is there a faster way to convert (.rtf|.doc) files to clean Markdown Syntax that I can take advantage of?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

怪异←思 2024-08-01 15:52:07

如果您碰巧使用的是 Mac,textutil 可以很好地将 doc、docx 和 rtf 转换为 html,而 pandoc 可以很好地将生成的 html 转换为 markdown:

$ textutil -convert html file.doc -stdout | pandoc -f html -t markdown -o file.md

我有一个 <我不久前拼凑的一个 href="https://gist.github.com/1181510" rel="noreferrer">脚本 尝试使用 textutil、pdf2html 和 pandoc 来转换我抛出的任何内容降价。

If you happen to be on a mac, textutil does a good job of converting doc, docx, and rtf to html, and pandoc does a good job of converting the resulting html to markdown:

$ textutil -convert html file.doc -stdout | pandoc -f html -t markdown -o file.md

I have a script that I threw together a while back that tries to use textutil, pdf2html, and pandoc to convert whatever I throw at it to markdown.

两仪 2024-08-01 15:52:07

ProgTips 有一个可能的解决方案 Word 宏(源代码下载)

一个简单的宏(源代码下载) 用于自动转换最琐碎的事情。
该宏的作用是:

  • 替换粗体和斜体
  • 替换标题(标记为标题 1-6)
  • 替换编号列表和项目符号列表

它有很多错误,我相信它挂在较大的文档上,但是我
无论如何,并没有说明它是稳定版本! :-) 仅供实验使用,
根据需要重新编码并重用它,如果您找到了,请发表评论
更好的解决方案。

来源:ProgTips

宏源码

安装

  • 打开 WinWord,
  • 按 Alt+F11 打开 VBA 编辑器,
  • 右键单击项目浏览器中的第一个项目
  • 选择插入->模块
  • 粘贴文件中的代码
  • 关闭宏编辑器
  • 转到工具>宏>宏; 运行名为 MarkDown 的宏

ProgTips

Source

宏源代码,以便安全保存 ProgTips删除帖子或网站被删除:

'*** A simple MsWord->Markdown replacement macro by Kriss Rauhvargers, 2006.02.02.
'*** This tool does NOT implement all the markup specified in MarkDown definition by John Gruber, only
'*** the most simple things. These are:
'*** 1) Replaces all non-list paragraphs to ^p paragraph so MarkDown knows it is a stand-alone paragraph
'*** 2) Converts tables to text. In fact, tables get lost.
'*** 3) Adds a single indent to all indented paragraphs
'*** 4) Replaces all the text in italics to _text_
'*** 5) Replaces all the text in bold to **text**
'*** 6) Replaces Heading1-6 to #..#Heading (Heading numbering gets lost)
'*** 7) Replaces bulleted lists with ^p *  listitem ^p*  listitem2...
'*** 8) Replaces numbered lists with ^p 1. listitem ^p2.  listitem2...
'*** Feel free to use and redistribute this code
Sub MarkDown()
    Dim bReplace As Boolean
    Dim i As Integer
    Dim oPara As Paragraph
    
        
    'remove formatting from paragraph sign so that we dont get **blablabla^p** but rather **blablabla**^p
    Call RemoveBoldEnters
    
    
    For i = Selection.Document.Tables.Count To 1 Step -1
            Call Selection.Document.Tables(i).ConvertToText
    Next
    
    'simple text indent + extra paragraphs for non-numbered paragraphs
    For i = Selection.Document.Paragraphs.Count To 1 Step -1
        Set oPara = Selection.Document.Paragraphs(i)
        If oPara.Range.ListFormat.ListType = wdListNoNumbering Then
            If oPara.LeftIndent > 0 Then
                oPara.Range.InsertBefore (">")
            End If
            oPara.Range.InsertBefore (vbCrLf)
        End If
        
        
    Next
    
    'italic -> _italic_
    Selection.HomeKey Unit:=wdStory
    bReplace = ReplaceOneItalic  'first replacement
    While bReplace 'other replacements
        bReplace = ReplaceOneItalic
    Wend

    'bold-> **bold**
    Selection.HomeKey Unit:=wdStory
    bReplace = ReplaceOneBold 'first replacement
    While bReplace
        bReplace = ReplaceOneBold 'other replacements
    Wend
    
   
    
    'Heading -> ##heading
    For i = 1 To 6 'heading1 to heading6
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceH(i) 'first replacement
        While bReplace
            bReplace = ReplaceH(i) 'other replacements
        Wend
    Next
    
    Call ReplaceLists
    
    
    Selection.HomeKey Unit:=wdStory
End Sub


'***************************************************************
' Function to replace bold with _bold_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'***************************************************************
Function ReplaceOneBold() As Boolean
    Dim bReturn As Boolean

    Selection.Find.ClearFormatting
    With Selection.Find
        .Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Font.Bold = True
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
    bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Text = "**" & Selection.Text & "**"
        Selection.Font.Bold = False
        Selection.Find.Execute
    Wend
    
    ReplaceOneBold = bReturn
End Function

'*******************************************************************
' Function to replace italic with _italic_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'********************************************************************
Function ReplaceOneItalic() As Boolean
    Dim bReturn As Boolean

        Selection.Find.ClearFormatting
    
    With Selection.Find
        .Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Font.Italic = True
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
    bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Text = "_" & Selection.Text & "_"
        Selection.Font.Italic = False
        Selection.Find.Execute
    Wend
    ReplaceOneItalic = bReturn
End Function

'*********************************************************************
' Function to replace headingX with #heading, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'*********************************************************************
Function ReplaceH(ByVal ipNumber As Integer) As Boolean
    Dim sReplacement As String
    
    Select Case ipNumber
    Case 1: sReplacement = "#"
    Case 2: sReplacement = "##"
    Case 3: sReplacement = "###"
    Case 4: sReplacement = "####"
    Case 5: sReplacement = "#####"
    Case 6: sReplacement = "######"
    End Select
    
    Selection.Find.ClearFormatting
    Selection.Find.Style = ActiveDocument.Styles("Heading " & ipNumber)
    With Selection.Find
        .Text = ""
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
   
     bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Range.InsertBefore (vbCrLf & sReplacement & " ")
        Selection.Style = ActiveDocument.Styles("Normal")
        Selection.Find.Execute
    Wend
    
    ReplaceH = bReturn
End Function



'***************************************************************
' A fix-up for paragraph marks that ar are bold or italic
'***************************************************************
Sub RemoveBoldEnters()
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Font.Italic = True
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Bold = False
    Selection.Find.Replacement.Font.Italic = False
    With Selection.Find
        .Text = "^p"
        .Replacement.Text = "^p"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Font.Bold = True
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Bold = False
    Selection.Find.Replacement.Font.Italic = False
    With Selection.Find
        .Text = "^p"
        .Replacement.Text = "^p"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

'***************************************************************
' Function to replace bold with _bold_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'***************************************************************
Sub ReplaceLists()
    Dim i As Integer
    Dim j As Integer
    Dim Para As Paragraph
        
    Selection.HomeKey Unit:=wdStory
    
    'iterate through all the lists in the document
    For i = Selection.Document.Lists.Count To 1 Step -1
        'check each paragraph in the list
        For j = Selection.Document.Lists(i).ListParagraphs.Count To 1 Step -1
            Set Para = Selection.Document.Lists(i).ListParagraphs(j)
            'if it's a bulleted list
            If Para.Range.ListFormat.ListType = wdListBullet Then
                        Para.Range.InsertBefore (ListIndent(Para.Range.ListFormat.ListLevelNumber, "*"))
            'if it's a numbered list
            ElseIf Para.Range.ListFormat.ListType = wdListSimpleNumbering Or _
                                                    wdListMixedNumbering Or _
                                                    wdListListNumOnly Then
                Para.Range.InsertBefore (Para.Range.ListFormat.ListValue & ".  ")
            End If
        Next j
        'inserts paragraph marks before and after, removes the list itself
        Selection.Document.Lists(i).Range.InsertParagraphBefore
        Selection.Document.Lists(i).Range.InsertParagraphAfter
        Selection.Document.Lists(i).RemoveNumbers
    Next i
End Sub

'***********************************************************
' Returns the MarkDown indent text
'***********************************************************
Function ListIndent(ByVal ipNumber As Integer, ByVal spChar As String) As String
    Dim i  As Integer
    For i = 1 To ipNumber - 1
        ListIndent = ListIndent & "    "
    Next
    ListIndent = ListIndent & spChar & "    "
End Function

来源:ProgTips

ProgTips has a possible solution with a Word macro (source download):

A simple macro (source download) for converting the most trivial things automatically.
This macro does:

  • Replace bold and italics
  • Replace headings (marked heading 1-6)
  • Replace numbered and bulleted lists

It's very buggy, I believe it hangs on larger documents, however I'm
NOT stating it's a stable release anyway! :-) Experimental use only,
recode and reuse it as you like, post a comment if you've found a
better solution.

Source: ProgTips

Macro source

Installation

  • open WinWord,
  • press Alt+F11 to open the VBA editor,
  • right click the first project in the project browser
  • choose insert->module
  • paste the code from the file
  • close macro editor
  • go tools>macro>macros; run the macro named MarkDown

Source: ProgTips

Source

Macro source for safe keeping if ProgTips deletes the post or the site gets wiped out:

'*** A simple MsWord->Markdown replacement macro by Kriss Rauhvargers, 2006.02.02.
'*** This tool does NOT implement all the markup specified in MarkDown definition by John Gruber, only
'*** the most simple things. These are:
'*** 1) Replaces all non-list paragraphs to ^p paragraph so MarkDown knows it is a stand-alone paragraph
'*** 2) Converts tables to text. In fact, tables get lost.
'*** 3) Adds a single indent to all indented paragraphs
'*** 4) Replaces all the text in italics to _text_
'*** 5) Replaces all the text in bold to **text**
'*** 6) Replaces Heading1-6 to #..#Heading (Heading numbering gets lost)
'*** 7) Replaces bulleted lists with ^p *  listitem ^p*  listitem2...
'*** 8) Replaces numbered lists with ^p 1. listitem ^p2.  listitem2...
'*** Feel free to use and redistribute this code
Sub MarkDown()
    Dim bReplace As Boolean
    Dim i As Integer
    Dim oPara As Paragraph
    
        
    'remove formatting from paragraph sign so that we dont get **blablabla^p** but rather **blablabla**^p
    Call RemoveBoldEnters
    
    
    For i = Selection.Document.Tables.Count To 1 Step -1
            Call Selection.Document.Tables(i).ConvertToText
    Next
    
    'simple text indent + extra paragraphs for non-numbered paragraphs
    For i = Selection.Document.Paragraphs.Count To 1 Step -1
        Set oPara = Selection.Document.Paragraphs(i)
        If oPara.Range.ListFormat.ListType = wdListNoNumbering Then
            If oPara.LeftIndent > 0 Then
                oPara.Range.InsertBefore (">")
            End If
            oPara.Range.InsertBefore (vbCrLf)
        End If
        
        
    Next
    
    'italic -> _italic_
    Selection.HomeKey Unit:=wdStory
    bReplace = ReplaceOneItalic  'first replacement
    While bReplace 'other replacements
        bReplace = ReplaceOneItalic
    Wend

    'bold-> **bold**
    Selection.HomeKey Unit:=wdStory
    bReplace = ReplaceOneBold 'first replacement
    While bReplace
        bReplace = ReplaceOneBold 'other replacements
    Wend
    
   
    
    'Heading -> ##heading
    For i = 1 To 6 'heading1 to heading6
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceH(i) 'first replacement
        While bReplace
            bReplace = ReplaceH(i) 'other replacements
        Wend
    Next
    
    Call ReplaceLists
    
    
    Selection.HomeKey Unit:=wdStory
End Sub


'***************************************************************
' Function to replace bold with _bold_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'***************************************************************
Function ReplaceOneBold() As Boolean
    Dim bReturn As Boolean

    Selection.Find.ClearFormatting
    With Selection.Find
        .Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Font.Bold = True
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
    bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Text = "**" & Selection.Text & "**"
        Selection.Font.Bold = False
        Selection.Find.Execute
    Wend
    
    ReplaceOneBold = bReturn
End Function

'*******************************************************************
' Function to replace italic with _italic_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'********************************************************************
Function ReplaceOneItalic() As Boolean
    Dim bReturn As Boolean

        Selection.Find.ClearFormatting
    
    With Selection.Find
        .Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Font.Italic = True
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
    bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Text = "_" & Selection.Text & "_"
        Selection.Font.Italic = False
        Selection.Find.Execute
    Wend
    ReplaceOneItalic = bReturn
End Function

'*********************************************************************
' Function to replace headingX with #heading, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'*********************************************************************
Function ReplaceH(ByVal ipNumber As Integer) As Boolean
    Dim sReplacement As String
    
    Select Case ipNumber
    Case 1: sReplacement = "#"
    Case 2: sReplacement = "##"
    Case 3: sReplacement = "###"
    Case 4: sReplacement = "####"
    Case 5: sReplacement = "#####"
    Case 6: sReplacement = "######"
    End Select
    
    Selection.Find.ClearFormatting
    Selection.Find.Style = ActiveDocument.Styles("Heading " & ipNumber)
    With Selection.Find
        .Text = ""
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    
   
     bReturn = False
    While Selection.Find.Execute = True
        bReturn = True
        Selection.Range.InsertBefore (vbCrLf & sReplacement & " ")
        Selection.Style = ActiveDocument.Styles("Normal")
        Selection.Find.Execute
    Wend
    
    ReplaceH = bReturn
End Function



'***************************************************************
' A fix-up for paragraph marks that ar are bold or italic
'***************************************************************
Sub RemoveBoldEnters()
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Font.Italic = True
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Bold = False
    Selection.Find.Replacement.Font.Italic = False
    With Selection.Find
        .Text = "^p"
        .Replacement.Text = "^p"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Font.Bold = True
    Selection.Find.Replacement.ClearFormatting
    Selection.Find.Replacement.Font.Bold = False
    Selection.Find.Replacement.Font.Italic = False
    With Selection.Find
        .Text = "^p"
        .Replacement.Text = "^p"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

'***************************************************************
' Function to replace bold with _bold_, only the first occurance
' Returns true if any occurance found, false otherwise
' Originally recorded by WinWord macro recorder, probably contains
' quite a lot of useless code
'***************************************************************
Sub ReplaceLists()
    Dim i As Integer
    Dim j As Integer
    Dim Para As Paragraph
        
    Selection.HomeKey Unit:=wdStory
    
    'iterate through all the lists in the document
    For i = Selection.Document.Lists.Count To 1 Step -1
        'check each paragraph in the list
        For j = Selection.Document.Lists(i).ListParagraphs.Count To 1 Step -1
            Set Para = Selection.Document.Lists(i).ListParagraphs(j)
            'if it's a bulleted list
            If Para.Range.ListFormat.ListType = wdListBullet Then
                        Para.Range.InsertBefore (ListIndent(Para.Range.ListFormat.ListLevelNumber, "*"))
            'if it's a numbered list
            ElseIf Para.Range.ListFormat.ListType = wdListSimpleNumbering Or _
                                                    wdListMixedNumbering Or _
                                                    wdListListNumOnly Then
                Para.Range.InsertBefore (Para.Range.ListFormat.ListValue & ".  ")
            End If
        Next j
        'inserts paragraph marks before and after, removes the list itself
        Selection.Document.Lists(i).Range.InsertParagraphBefore
        Selection.Document.Lists(i).Range.InsertParagraphAfter
        Selection.Document.Lists(i).RemoveNumbers
    Next i
End Sub

'***********************************************************
' Returns the MarkDown indent text
'***********************************************************
Function ListIndent(ByVal ipNumber As Integer, ByVal spChar As String) As String
    Dim i  As Integer
    For i = 1 To ipNumber - 1
        ListIndent = ListIndent & "    "
    Next
    ListIndent = ListIndent & spChar & "    "
End Function

Source: ProgTips

戏蝶舞 2024-08-01 15:52:07

如果您愿意使用 .docx 格式,您可以使用我编写的这个 PHP 脚本,它将提取 XML、运行一些 XSL 转换并输出相当不错的 Markdown 等效项:

https://github.com/matb33/docx2md

请注意,它是从命令行工作的,并且其界面相当基本。 但是,它会完成工作!

如果该脚本对您来说效果不够好,我鼓励您将您的 .docx 文件发送给我,以便我可以重现您的问题并修复它。 在 GitHub 中记录问题,或者如果您愿意,可以直接联系我。

If you're open to using the .docx format, you could use this PHP script that I put together that will extract the XML, run some XSL transformations and output a pretty decent Markdown equivalent:

https://github.com/matb33/docx2md

Note that it is meant to work from the command-line, and is rather basic in its interface. However, it will get the job done!

If the script doesn't work well enough for you, I encourage you to send me your .docx files so I can reproduce your problem and fix it. Log an issue in GitHub or contact me directly if you prefer.

棒棒糖 2024-08-01 15:52:07

Pandoc 是一个很好的命令行转换工具,但同样,您首先需要将输入转换为 Pandoc 可以使用的格式可以读取,即:

  • markdown
  • reStructuredText
  • Textile
  • HTML
  • LaTeX

Pandoc is a good command-line conversion tool, but again, you will first need to get the input into a format that Pandoc can read, which is:

  • markdown
  • reStructuredText
  • textile
  • HTML
  • LaTeX
定格我的天空 2024-08-01 15:52:07

我们遇到了同样的问题,必须将 Word 文档转换为 Markdown。 有些是更复杂且(非常)大的文档,包含数学方程和图像等。 所以我制作了这个使用多种不同工具进行转换的脚本: https://github.com/Versal/word2markdown

因为它使用一系列工具,所以更容易出错,但如果您有更复杂的文档,它可能是一个很好的起点。 希望它能有所帮助! :)

更新:
它目前仅适用于 Mac OS X,并且您需要安装一些要求(Word、Pandoc、HTML Tidy、git、node/npm)。 为了使其正常工作,您还需要打开一个空的Word文档,然后执行:文件->另存为网页->兼容性->编码->UTF-8。 然后将此编码保存为默认值。 有关如何设置的更多详细信息,请参阅自述文件。

然后在控制台运行:

$ git clone [email protected]:Versal/word2markdown.git
$ cd word2markdown
$ npm install
(copy over the Word files, for example, "document.docx")
$ ./doc-to-md.sh document.docx document_files > document.md

然后你可以在document.md中找到Markdown,在document_files目录中找到图像。

现在可能有点复杂,所以我欢迎任何使这变得更容易或使其在其他操作系统上工作的贡献! :)

We had the same problem of having to convert Word documents to markdown. Some were more complicated and (very) large documents, with math equations and images and such. So I made this script which converts using a number of different tools: https://github.com/Versal/word2markdown

Because it uses a chain of several tools it is a bit more error-prone, but it can be a good starting point if you have more complicated documents. Hope it can be helpful! :)

Update:
It currently only works on Mac OS X, and you need to have some requirements installed (Word, Pandoc, HTML Tidy, git, node/npm). For it to work properly, you also need to open an empty Word document, and do: File->Save As Webpage->Compatibility->Encoding->UTF-8. Then this encoding is saved as default. See the README for more details on how to set up.

Then run this in the console:

$ git clone [email protected]:Versal/word2markdown.git
$ cd word2markdown
$ npm install
(copy over the Word files, for example, "document.docx")
$ ./doc-to-md.sh document.docx document_files > document.md

Then you can find the Markdown in document.md and images in the directory document_files.

It's perhaps a bit complicated now, so I would welcome any contributions that make this easier or make this work on other operating systems! :)

允世 2024-08-01 15:52:07

你试过这个吗? 不确定功能丰富程度,但它适用于简单的文本。
http://markitdown.medusis.com/

Have you tried this one? Not sure about feature richness, but it works for simple texts.
http://markitdown.medusis.com/

筱果果 2024-08-01 15:52:07

作为大学 ruby​​ 课程的一部分,我开发了一个可以将 openoffice word 文件 (.odt) 转换为 Markdown 的工具。
为了将其转换为正确的格式,必须做出很多假设。 例如,很难确定必须被视为标题的文本的大小。
然而,您唯一可以通过此转换放松的是格式化任何满足的文本始终附加到 Markdown 文档。
我开发的工具支持列表、粗体和斜体文本,并且它具有表格语法。

http://github.com/bostko/doc2text
尝试一下,请给我您的反馈。

As part of the university ruby course I developed a tool which can convert openoffice word files (.odt) to markdown.
A lot of assumptions has to be made in order to turn it to correct formatting. For example it is hard to determine the size of a text which has to be considered as Heading.
However the only think that you can loose with this conversion is the formatting any text that is met is always appends to the markdown document.
The tool I've developed supports lists, bold and italic text, and it has syntax for tables.

http://github.com/bostko/doc2text
Give it a try and please give me your feedback.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文