使用浏览器使用TypeScript（JavaScript）访问S3 Multibyte Unicode FileName文件

发布于 2025-02-11 21:35:20 字数 1320 浏览 1 评论 0 原文

我正在更新Angular Web应用程序，以播放从AWS S3存储桶中检索到的任何口头语言音频文件。 S3存储桶中的许多文件（将具有）多键单数文件名（因为应用程序将支持全局用户）。 AWS（S3）正在以无法轻松在浏览器中复制的方式编码文件名。应用程序后端lambda函数发送音频文件的文件名进行检索和Angular应用程序，然后实例化 htmlaudioelement 将HTTP GET请求发送到S3。

答：Windows上的文件名（上传到S3之前）：
Godzilla [BlueÖystercult]仪器＃12.Wav

B.用S3控制台摄入的文件名：
godzilla％5BBlue％C3％96yster Cult％5D仪器％2312.Wav

C. fileName（s3 Console）（下载链接）：
godzilla+％5bblue+％C3％96yster+cult％5D+instrumental+％2312.Wav

D.从申请中返回的fileName backend backend lambda（与a）：
Godzilla [BlueÖystercult]工具＃12.Wav

E. HTTP获取浏览器 audio.load（） fileName：
godzilla+[蓝色+o％cc％88yster+cult]+工具+％2312.wav

注：d＆amp;使用浏览器网络开发工具确定E（上图），

该文件通过Windows的S3控制台上传到S3。保存在后端RDS数据库中的文件名与Windows文件名匹配。因为Lambda正在从RDS数据库中检索文件名，所以Lambda将Windows文件名返回浏览器上的Angular UI。浏览器 audio.load（）正在转换多键Ö“错误地”以访问S3上的文件（浏览器： o％cc％88yster （UTF-8结合透射术）vs. S3：％C3％96YSTER ）。看起来浏览器的重点是转换重音，而不是S3似乎已经完成的多重角色。

我不允许剥离多型字符，而是借助ASCII字符集。我正在寻找一种方法（无需为每种可能的多级特征转换而无需手动编码映射）（理想情况下，理想情况下而无需新的依赖性），以“说服”浏览器以与S3一样的方式行事...我猜这个问题归结为“ S3在文件名中转换多键字符的逻辑是什么？”谁能提供实现这一目标的方法？

注意：Angular应用程序已经具有正确处理典型S3特殊字符案例的逻辑。这个问题只是集中于国际角色集。

原文

I am updating an Angular web application to play any spoken language audio files retrieved from an AWS S3 bucket. Many of the files in the S3 bucket have (will have) multibyte-Unicode file names (because the application will be supporting global users). AWS (S3) is encoding the filename in a manner that I cannot easily replicate in the browser. An application backend Lambda function sends the filename of the audio file to retrieve and the Angular application then instantiates a HTMLAudioElement which sends an HTTP GET request to S3.

A. Filename as on Windows (before upload to S3):
Godzilla [Blue Öyster Cult] instrumental #12.wav

B. Filename as ingested with the S3 console:
Godzilla %5BBlue %C3%96yster Cult%5D instrumental %2312.wav

C. Filename as shown in the S3 console (download link):
Godzilla+%5BBlue+%C3%96yster+Cult%5D+instrumental+%2312.wav

D. Filename as returned from the application backend Lambda (same as A):
Godzilla [Blue Öyster Cult] instrumental #12.wav

E. The HTTP GET browser audio.load() filename:
Godzilla+[Blue+O%CC%88yster+Cult]+instrumental+%2312.wav

Note: D & E (above) were determined using the browser network development tool

The file was uploaded to S3 via the S3 console from Windows. The filename that's saved in the backend RDS database matches the Windows filename. Because the Lambda is retrieving the filename from the RDS database the Lambda is returning the Windows filename to the Angular UI on the browser. The browser audio.load() is converting the multibyte Ö "incorrectly" to access the file on S3 (browser: O%CC%88yster (UTF-8 COMBINING DIAERESIS) vs. S3: %C3%96yster). It looks like the browser is focused on converting the accent instead of a multibyte character as S3 seems to have done.

I am not allowed to strip the multibyte characters in favor of an ASCII character set. I'm looking for a way (without hand coding mappings for every possible multibyte-character conversion) (and ideally without a new dependency) to "convince" the browser to behave in the same way that S3 does... I guess the question boils down to, "What is S3's logic for converting multibyte characters in files names?" Can anyone offer an approach to achieve this?

Note: The Angular application already has logic to handle the typical S3 special character cases correctly. This question is just focused upon international character sets.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你如我软肋 2025-02-18 21:35:20

下面的方法使用将每个Unicode字符转换为其规范分解形式和 encodeuricomponent（） 将编码编码的UTF-8字符应用于整个URL，而无需更改/和<</code>和<代码>：字符在更改 space 字符+字符时。源URL是 split（）使用/ ： space> space 作为定界数。 / ： space 作为定界符，来自 split（）的输出中，以及URL的片段不是/ ： space 定界符具有 normalize（'nfc'）和 encodeuricomponent（） /代码>应用。

const splits = '[\/: ]';
const splitter = new RegExp(`(?=${splits})|(?<=${splits})`, 'g');

export function s3Url(url: string): string {
  return url.split(splitter).reduce((s3Url: string, segment: string) => s3Url + (' ' === segment ? '+' : '\/' === segment || ':' === segment ? segment : encodeURIComponent(segment.normalize('NFC'))), '');
}

The approach below uses normalize('NFC') to translate each Unicode character into its canonical decomposed form and encodeURIComponent() to apply the UTF-8 character encoding to the entire URL without altering the / and the : characters while changing space characters to + characters. The source url is split() using / : space as delimiters. The / : space as delimiter are included in the output from split() and the segments of the URL that are not a / : space delimiter have normalize('NFC') and encodeURIComponent() applied.

const splits = '[\/: ]';
const splitter = new RegExp(`(?=${splits})|(?<=${splits})`, 'g');

export function s3Url(url: string): string {
  return url.split(splitter).reduce((s3Url: string, segment: string) => s3Url + (' ' === segment ? '+' : '\/' === segment || ':' === segment ? segment : encodeURIComponent(segment.normalize('NFC'))), '');
}

回复收藏 0 原文

~没有更多了~