在作家中产生大兰群的宏

发布于 2025-01-29 07:30:16 字数 841 浏览 2 评论 0原文

我如何使用基本语言生成大型群岛？

我可以在Python中这样做...

import nltk, sys
from nltk.tokenize import word_tokenize
sys.stdout = open("mygram1.txt", "w")
with open("mytext.txt") as f:
for text in f:
    tokens = nltk.word_tokenize(text)
    bigrm = (nltk.bigrams(tokens))
    print(*map(' '.join, bigrm), sep='\n')

但是我需要一个可以在Libreoffice Writer中运行的宏。我不想使用Python。

更新：

就像Bigrams一样，NLTK具有我使用nltk.trigrams调用的Trigrams方法，如果我需要四到五克，则每个Gravers都有！

from nltk import everygrams
import nltk, sys
from nltk.tokenize import word_tokenize
sys.stdout = open("myfourgram1.txt", "w")
with open("/home/ubuntu/mytext.txt") as f:
  for text in f:
      tokens = nltk.word_tokenize(text)
      for i in list(everygrams(tokens, 4, 4)):
          print((" ".join(i)))

在Libreoffice Basic中是否有可能？

原文

How do I generate bigrams using basic language?

I can do that in Python like this...

import nltk, sys
from nltk.tokenize import word_tokenize
sys.stdout = open("mygram1.txt", "w")
with open("mytext.txt") as f:
for text in f:
    tokens = nltk.word_tokenize(text)
    bigrm = (nltk.bigrams(tokens))
    print(*map(' '.join, bigrm), sep='\n')

But I need a macro that I can run in Libreoffice writer. I do not want to use Python.

Update:

just like bigrams, nltk has trigrams method that I call using nltk.trigrams And if I need four or five grams there is everygrams!

from nltk import everygrams
import nltk, sys
from nltk.tokenize import word_tokenize
sys.stdout = open("myfourgram1.txt", "w")
with open("/home/ubuntu/mytext.txt") as f:
  for text in f:
      tokens = nltk.word_tokenize(text)
      for i in list(everygrams(tokens, 4, 4)):
          print((" ".join(i)))

Is it possible in libreoffice basic?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一场信仰旅途 2025-02-05 07:30:16

您可以通过在回答我的以前问题的回答中回收代码来复制Python代码的行为（您可以在作者中打印通过咒语检查生成的波浪线？）。首先剥离所有与拼写检查，生成替代方案和排序有关的内容，从而使其短得多，并更改将结果插入新文档中的行，以使其仅插入单词对。您不必将输入文本放入.txt文件中，而必须将其放入作者文档中，结果将出现在新的作者文档中。

它应该看起来像下面的列表。这还包括子公司函数iswordseparator（）

Option Explicit

Sub ListBigrams

    Dim oSource As Object 
    oSource = ThisComponent

    Dim oSourceCursor As Object
    oSourceCursor = oSource.getText.createTextCursor()
    oSourceCursor.gotoStart(False)
    oSourceCursor.collapseToStart()

    Dim oDestination As Object
    oDestination = StarDesktop.loadComponentFromURL( "private:factory/swriter",  "_blank", 0, Array() )

    Dim oDestinationText as Object
    oDestinationText = oDestination.getText()

    Dim oDestinationCursor As Object
    oDestinationCursor = oDestinationText.createTextCursor()

    Dim s As String, sParagraph As String, sPreviousWord As String, sThisWord As String    
    Dim i as Long, j As Long, nWordStart As Long, nWordEnd As Long, nChar As Long
    Dim bFirst as Boolean
    
    sPreviousWord = ""
    bFirst = true

    Do
        oSourceCursor.gotoEndOfParagraph(True)
        sParagraph = oSourceCursor.getString() & " " 'It is necessary to add a space to the end of
        'the string otherwise the last word of the paragraph is not recognised.
        
        nWordStart = 1
        nWordEnd = 1
        
        For i = 1 to Len(sParagraph)
        
            nChar = ASC(Mid(sParagraph, i, 1))
            
            If IsWordSeparator(nChar) Then   '1
            
                If nWordEnd > nWordStart Then   '2
                
                sThisWord = Mid(sParagraph, nWordStart, nWordEnd - nWordStart)
                                    
                If bFirst Then
                    bFirst = False
                Else
                    oDestinationText.insertString(oDestinationCursor, sPreviousWord & " " & sThisWord & Chr(13), False)
                EndIf
                                
                sPreviousWord = sThisWord
                
                End If   '2                
                nWordEnd = nWordEnd + 1
                nWordStart = nWordEnd                   
                Else                
                nWordEnd = nWordEnd + 1                   
            End If    '1

        Next i

    Loop While oSourceCursor.gotoNextParagraph(False)

End Sub

'----------------------------------------------------------------------------

' OOME Listing 360. 
Function IsWordSeparator(iChar As Long) As Boolean

    ' Horizontal tab \t 9
    ' New line \n 10
    ' Carriage return \r 13
    ' Space   32
    ' Non-breaking space   160     

    Select Case iChar
    Case 9, 10, 13, 32, 160
        IsWordSeparator = True
    Case Else
        IsWordSeparator = False
    End Select    
End Function

，即使在吉姆·K（Jim K）建议的那样，在python中更容易地进行操作，基本方法将使将功能分配给用户变得更加容易，因为他们会更容易不必安装Python和NLTK库（这并不简单）。

You could replicate the behaviour of your Python code by recycling the code in my answer to your previous question (Can you Print the wavy lines generated by Spell check in writer?). First strip out all the stuff relating to spell checking, generating alternatives and sorting, thereby making it considerably shorter, and change the line that inserts the results into the new document to make it just insert pairs of words. Rather than having your input text in a .txt file, you would have to put them into a writer document, and the results would appear in a new writer document.

It should look something like the listing below. This also includes the subsidiary function IsWordSeparator()

Option Explicit

Sub ListBigrams

    Dim oSource As Object 
    oSource = ThisComponent

    Dim oSourceCursor As Object
    oSourceCursor = oSource.getText.createTextCursor()
    oSourceCursor.gotoStart(False)
    oSourceCursor.collapseToStart()

    Dim oDestination As Object
    oDestination = StarDesktop.loadComponentFromURL( "private:factory/swriter",  "_blank", 0, Array() )

    Dim oDestinationText as Object
    oDestinationText = oDestination.getText()

    Dim oDestinationCursor As Object
    oDestinationCursor = oDestinationText.createTextCursor()

    Dim s As String, sParagraph As String, sPreviousWord As String, sThisWord As String    
    Dim i as Long, j As Long, nWordStart As Long, nWordEnd As Long, nChar As Long
    Dim bFirst as Boolean
    
    sPreviousWord = ""
    bFirst = true

    Do
        oSourceCursor.gotoEndOfParagraph(True)
        sParagraph = oSourceCursor.getString() & " " 'It is necessary to add a space to the end of
        'the string otherwise the last word of the paragraph is not recognised.
        
        nWordStart = 1
        nWordEnd = 1
        
        For i = 1 to Len(sParagraph)
        
            nChar = ASC(Mid(sParagraph, i, 1))
            
            If IsWordSeparator(nChar) Then   '1
            
                If nWordEnd > nWordStart Then   '2
                
                sThisWord = Mid(sParagraph, nWordStart, nWordEnd - nWordStart)
                                    
                If bFirst Then
                    bFirst = False
                Else
                    oDestinationText.insertString(oDestinationCursor, sPreviousWord & " " & sThisWord & Chr(13), False)
                EndIf
                                
                sPreviousWord = sThisWord
                
                End If   '2                
                nWordEnd = nWordEnd + 1
                nWordStart = nWordEnd                   
                Else                
                nWordEnd = nWordEnd + 1                   
            End If    '1

        Next i

    Loop While oSourceCursor.gotoNextParagraph(False)

End Sub

'----------------------------------------------------------------------------

' OOME Listing 360. 
Function IsWordSeparator(iChar As Long) As Boolean

    ' Horizontal tab \t 9
    ' New line \n 10
    ' Carriage return \r 13
    ' Space   32
    ' Non-breaking space   160     

    Select Case iChar
    Case 9, 10, 13, 32, 160
        IsWordSeparator = True
    Case Else
        IsWordSeparator = False
    End Select    
End Function

Even if it would be easier to do it in Python, as Jim K suggested, the BASIC approach would make it easier to distribute the functionality to users, since they would not have to install Python and the NLTK library (which is not straightforward).

回复收藏 0 原文

~没有更多了~