我正在更新Angular Web应用程序,以播放从AWS S3存储桶中检索到的任何口头语言音频文件。 S3存储桶中的许多文件(将具有)多键单数文件名(因为应用程序将支持全局用户)。 AWS(S3)正在以无法轻松在浏览器中复制的方式编码文件名。应用程序后端lambda函数发送音频文件的文件名进行检索和Angular应用程序,然后实例化 htmlaudioelement
将HTTP GET请求发送到S3。
答:Windows上的文件名(上传到S3之前):
Godzilla [BlueÖystercult]仪器#12.Wav
B.用S3控制台摄入的文件名:
godzilla%5BBlue%C3%96yster Cult%5D仪器%2312.Wav
C. fileName(s3 Console)(下载链接):
godzilla+%5bblue+%C3%96yster+cult%5D+instrumental+%2312.Wav
D.从申请中返回的fileName backend backend lambda(与a):
Godzilla [BlueÖystercult]工具#12.Wav
E. HTTP获取浏览器 audio.load()
fileName:
godzilla+[蓝色+o%cc%88yster+cult]+工具+%2312.wav
注:d&使用浏览器网络开发工具确定E(上图),
该文件通过Windows的S3控制台上传到S3。保存在后端RDS数据库中的文件名与Windows文件名匹配。因为Lambda正在从RDS数据库中检索文件名,所以Lambda将Windows文件名返回浏览器上的Angular UI。浏览器 audio.load()
正在转换多键Ö
“错误地”以访问S3上的文件(浏览器: o%cc%88yster
(UTF-8结合透射术)vs. S3:%C3%96YSTER
)。看起来浏览器的重点是转换重音,而不是S3似乎已经完成的多重角色。
我不允许剥离多型字符,而是借助ASCII字符集。我正在寻找一种方法(无需为每种可能的多级特征转换而无需手动编码映射)(理想情况下,理想情况下而无需新的依赖性),以“说服”浏览器以与S3一样的方式行事...我猜这个问题归结为“ S3在文件名中转换多键字符的逻辑是什么?”谁能提供实现这一目标的方法?
注意:Angular应用程序已经具有正确处理典型S3特殊字符案例的逻辑。这个问题只是集中于国际角色集。
I am updating an Angular web application to play any spoken language audio files retrieved from an AWS S3 bucket. Many of the files in the S3 bucket have (will have) multibyte-Unicode file names (because the application will be supporting global users). AWS (S3) is encoding the filename in a manner that I cannot easily replicate in the browser. An application backend Lambda function sends the filename of the audio file to retrieve and the Angular application then instantiates a HTMLAudioElement
which sends an HTTP GET request to S3.
A. Filename as on Windows (before upload to S3):
Godzilla [Blue Öyster Cult] instrumental #12.wav
B. Filename as ingested with the S3 console:
Godzilla %5BBlue %C3%96yster Cult%5D instrumental %2312.wav
C. Filename as shown in the S3 console (download link):
Godzilla+%5BBlue+%C3%96yster+Cult%5D+instrumental+%2312.wav
D. Filename as returned from the application backend Lambda (same as A):
Godzilla [Blue Öyster Cult] instrumental #12.wav
E. The HTTP GET browser audio.load()
filename:
Godzilla+[Blue+O%CC%88yster+Cult]+instrumental+%2312.wav
Note: D & E (above) were determined using the browser network development tool
The file was uploaded to S3 via the S3 console from Windows. The filename that's saved in the backend RDS database matches the Windows filename. Because the Lambda is retrieving the filename from the RDS database the Lambda is returning the Windows filename to the Angular UI on the browser. The browser audio.load()
is converting the multibyte Ö
"incorrectly" to access the file on S3 (browser: O%CC%88yster
(UTF-8 COMBINING DIAERESIS) vs. S3: %C3%96yster
). It looks like the browser is focused on converting the accent instead of a multibyte character as S3 seems to have done.
I am not allowed to strip the multibyte characters in favor of an ASCII character set. I'm looking for a way (without hand coding mappings for every possible multibyte-character conversion) (and ideally without a new dependency) to "convince" the browser to behave in the same way that S3 does... I guess the question boils down to, "What is S3's logic for converting multibyte characters in files names?" Can anyone offer an approach to achieve this?
Note: The Angular application already has logic to handle the typical S3 special character cases correctly. This question is just focused upon international character sets.
发布评论
评论(1)
下面的方法使用 将每个Unicode字符转换为其规范分解形式和
encodeuricomponent()
将编码编码的UTF-8字符应用于整个URL,而无需更改/
和<</code>和<代码>:
字符在更改space
字符+
字符时。源URL是split()
使用/
:
space> space
作为定界数。/
:
space
作为定界符,来自split()
的输出中,以及URL的片段不是/
:
space
定界符具有normalize('nfc')
和encodeuricomponent() /代码>应用。
The approach below uses
normalize('NFC')
to translate each Unicode character into its canonical decomposed form andencodeURIComponent()
to apply the UTF-8 character encoding to the entire URL without altering the/
and the:
characters while changingspace
characters to+
characters. The source url issplit()
using/
:
space
as delimiters. The/
:
space
as delimiter are included in the output fromsplit()
and the segments of the URL that are not a/
:
space
delimiter havenormalize('NFC')
andencodeURIComponent()
applied.