将文件另存为 unicode 的脚本

发布于 2024-07-12 13:42:04 字数 108 浏览 5 评论 0原文

您知道我可以通过编程方式或脚本将一组以 ansi 字符编码保存的文本文件转换为 unicode 编码吗?

我想像我用记事本打开文件并选择将其另存为 unicode 文件时所做的那样。

Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding?

I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

无风消散 2024-07-19 13:42:04

这可能对您有用,但请注意,它会抓取当前文件夹中的每个文件:


Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }

为简洁起见,使用别名也是如此:


gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }

Steven Murawski 建议使用 Out-File 代替。 这两个 cmdlet 之间的差异如下:

  • Out-File 将尝试格式化它接收到的输入。
  • Out-File 的默认编码是基于 Unicode 的,而 Set-Content 使用系统的默认编码。

下面是一个假设文件 test.txt 在这两种情况下都不存在的示例:


PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt

IsPublic IsSerial Name                                     BaseType          
-------- -------- ----                                     --------          
True     True     String                                   System.Object     

# test.txt encoding is Unicode-based with BOM


PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt

System.String

# test.txt encoding is "ANSI" (Windows character set)

事实上,如果您不需要任何特定的 Unicode 编码,您也可以执行以下操作来转换文本文件到 Unicode:


PS> Get-Content sourceASCII.txt > targetUnicode.txt

Out-File 是一种“带有可选参数的重定向运算符”。

This could work for you, but notice that it'll grab every file in the current folder:


Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }

Same thing using aliases for brevity:


gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }

Steven Murawski suggests using Out-File instead. The differences between both cmdlets are the following:

  • Out-File will attempt to format the input it receives.
  • Out-File's default encoding is Unicode-based, whereas Set-Content uses the system's default.

Here's an example assuming the file test.txt doesn't exist in either case:


PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt

IsPublic IsSerial Name                                     BaseType          
-------- -------- ----                                     --------          
True     True     String                                   System.Object     

# test.txt encoding is Unicode-based with BOM


PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt

System.String

# test.txt encoding is "ANSI" (Windows character set)

In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:


PS> Get-Content sourceASCII.txt > targetUnicode.txt

Out-File is a "redirection operator with optional parameters" of sorts.

花开柳相依 2024-07-19 13:42:04

最简单的方法是 Get-Content 'path/to/text/file' | 输出文件“名称/文件/文件”。

Out-File 有一个 -encoding 参数,默认值为统一码。

如果你想编写一批脚本,你可以这样做

$files = get-childitem 'directory/of/text/files' 
foreach ($file in $files) 
{
  get-content $file | out-file $file.fullname
}

The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.

Out-File has an -encoding parameter, the default of which is Unicode.

If you wanted to script a batch of them, you could do something like

$files = get-childitem 'directory/of/text/files' 
foreach ($file in $files) 
{
  get-content $file | out-file $file.fullname
}
左岸枫 2024-07-19 13:42:04

将 System.IO.StreamReader(读取文件内容)类与 System.Text.Encoding.Encoding(创建执行编码的 Encoder 对象)基类一起使用。

Use the System.IO.StreamReader(To read the file contents) class together with the System.Text.Encoding.Encoding(To create the Encoder object which does the encoding) base class.

旧城烟雨 2024-07-19 13:42:04

您可以创建一个新的文本文件,并将原始文件中的字节写入新文件,并在每个原始字节之前放置“\0”(假设原始文本文件是英文的)。

You could create a new text file and write the bytes from the original file into the new one, placing a '\0' before each original byte (assuming the original text file was in English).

山人契 2024-07-19 13:42:04

您可以使用 iconv。 在 Windows 上,您可以在 Cygwin 下使用它。

iconv -f from_encoding -t to_encoding file

You can use iconv. On Windows you can use it under Cygwin.

iconv -f from_encoding -t to_encoding file
妄司 2024-07-19 13:42:04

伪代码...

Dim 系统、文件、内容、newFile、oldFile

Const ForReading = 1、ForWriting = 2、ForAppending = 3
Const AnsiFile = -2, UnicodeFile = -1

设置系统 = CreateObject("Scripting.FileSystemObject...

设置文件 = system.GetFile("text1.txt")

设置oldFile = file.OpenAsTextStream(ForReading, AnsiFile)

内容 = oldFile. ReadAll()

oldFile.Close

system.CreateTextFile "text1.txt"

Set file = system.GetFile("text1.txt")

Set newFile = file.OpenAsTextStream(ForWriting, UnicodeFile)

newFile.Write 内容

newFile.Close

希望这种方法能够工作..

pseudo code...

Dim system, file, contents, newFile, oldFile

Const ForReading = 1, ForWriting = 2, ForAppending = 3
Const AnsiFile = -2, UnicodeFile = -1

Set system = CreateObject("Scripting.FileSystemObject...

Set file = system.GetFile("text1.txt")

Set oldFile = file.OpenAsTextStream(ForReading, AnsiFile)

contents = oldFile.ReadAll()

oldFile.Close

system.CreateTextFile "text1.txt"

Set file = system.GetFile("text1.txt")

Set newFile = file.OpenAsTextStream(ForWriting, UnicodeFile)

newFile.Write contents

newFile.Close

Hope this approach will work..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文