替换“智能报价”;在 PowerShell 中

发布于 2024-11-28 09:26:56 字数 576 浏览 2 评论 0原文

我发现自己有点被一个简单的问题难住了。我正在尝试从一堆文本文件中删除花哨的引用。我有以下脚本,我在其中尝试了多种不同的替换方法,但没有结果。

下面是从 GitHub 下载数据并尝试转换的示例。

$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"

$c = Get-Content "foo.txt"
$c | % { `
        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")
    } `
    |  Set-Content "foo2.txt"

要做到这一点有什么技巧呢?

I'm finding myself somewhat stumped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but without results.

Here's an example that downloads the data from GitHub and attempts to convert.

$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"

$c = Get-Content "foo.txt"
$c | % { `
        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")
    } `
    |  Set-Content "foo2.txt"

What's the trick for this to work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

国际总奸 2024-12-05 09:26:56

这是一个有效的版本:

    $srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
    $wc = New-Object net.WebClient
    $wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")

    $fancySingleQuotes = "[\u2019\u2018]"
    $fancyDoubleQuotes = "[\u201C\u201D]"

    $c = Get-Content "foo.txt" -Encoding UTF8

    $c | % { `
        $_ = [regex]::Replace($_, $fancySingleQuotes, "'")
        [regex]::Replace($_, $fancyDoubleQuotes, '"')
    } `
    |  Set-Content "foo2.txt"

manojlds 版本 不是的原因对您不起作用的是您从 GitHub 获取的文件上的编码与正则表达式中的 Unicode 字符不兼容。以 UTF-8 格式读取它可以解决该问题。

Here's a version that works:

    $srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
    $wc = New-Object net.WebClient
    $wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")

    $fancySingleQuotes = "[\u2019\u2018]"
    $fancyDoubleQuotes = "[\u201C\u201D]"

    $c = Get-Content "foo.txt" -Encoding UTF8

    $c | % { `
        $_ = [regex]::Replace($_, $fancySingleQuotes, "'")
        [regex]::Replace($_, $fancyDoubleQuotes, '"')
    } `
    |  Set-Content "foo2.txt"

The reason that manojlds' version wasn't working for you is that the encoding on the file you're getting from GitHub wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.

你与昨日 2024-12-05 09:26:56

以下内容适用于您给出的输入和输出:

$c = Get-Content $file
$c | % { `

    $_ = $_.Replace("’","'")
    $_ = $_.Replace("`“","`"")
    $_.Replace("`”","`"")
    } `
    |  Set-Content $file

The following works on the input and output that you had given:

$c = Get-Content $file
$c | % { `

    $_ = $_.Replace("’","'")
    $_ = $_.Replace("`“","`"")
    $_.Replace("`”","`"")
    } `
    |  Set-Content $file
强辩 2024-12-05 09:26:56

您的最后一次替换将一个奇特引号和单引号放在一起。这就是你想要的吗?它与您的示例输出不匹配。试试这个:

$_.Replace("`“","`"")
$_.Replace("`”","`"")

Your last replace places a left fancy quote with and single quote. Is that what you want? It doesn't match your sample output. Try this:

$_.Replace("`“","`"")
$_.Replace("`”","`"")
琉璃梦幻 2024-12-05 09:26:56

这个堆栈溢出问题非常接近我的需要。我正在寻找可以检查任何 UTF8 的东西,并发现了这个问题:

如何使用正则表达式和 Notepad++ 删除所有非 ASCII 字符?

这似乎在 PowerShell 中也能正常工作。

他们在 PowerShell 中使用的正则表达式是:

[^\x00-\x7F]+

它将找到任何 UTF-8 字符。如果您需要更具体,您可以磨练正则表达式。

我的输入只有大引号作为 UTF-8 字符,因此这个简单的替换有效:

# Replace the UTF-8 quote with standard single quote
$cq = $cq -replace "[^\x00-\x7F]+", "'"

This Stack Overflow question is so close to what I need. I was looking for something that would check for any UTF8 and found this question:

How do I remove all non-ASCII characters with regex and Notepad++?

Which seems to work fine in PowerShell as well.

The regex they use that works in PowerShell is:

[^\x00-\x7F]+

Which will find any UTF-8 characters. You can hone the regex if you need to be more specific.

My input only had the curly quote(s) as UTF-8 characters, so this simple substitution worked:

# Replace the UTF-8 quote with standard single quote
$cq = $cq -replace "[^\x00-\x7F]+", "'"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文