Erlang 中的字符串 - 我应该检查哪些库和技术?

发布于 2024-08-03 15:02:10 字数 436 浏览 5 评论 0原文

我正在开展一个需要国际化支持的项目。我想开始使用 UTF 支持,我想知道在 Erlang 中处理 UTF 的最佳实践是什么?

从我目前的研究来看,Erlang 的内置字符串处理在某些用例中似乎存在一些问题(JSON 解析就是一个很好的例子)。

我一直在查看 Starling 并最近(在某处)读到它可能会是作为 UTF“标准”纳入标准 Erlang 版本。这是真的吗?我还应该考虑其他库或方法吗?

来自评论:

EEP(Erlang增强提案)10个细节在Erlang中表示Unicode字符

I am working on a project that will require internationalisation support down the track. I want to get started on the right foot with UTF support, and I was wondering what the best practice for handling UTF in Erlang is?

From my current research it seems there are a couple of issues with Erlang's built in string handling for some use cases (JSON parsing being a good example).

I have been looking at Starling and read (somewhere) recently that it is possibly going to be rolled into the standard Erlang release as the UTF 'standard'. Is this true? Are there other libraries or approaches I should be looking at?

From the comments:

EEP (Erlang Enhancement Proposal) 10 details Representing Unicode characters in Erlang

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷默言语 2024-08-10 15:02:10

此页面:

http://erlang.org/doc/highlights.html

...列表5.7/OTP R13A 版本的亮点。注意这段话:

1.2 Unicode 支持

对 Unicode 的支持实现为
EEP10 中描述。格式化和
从以下位置读取 unicode 数据
终端和文件支持
io 和 io_lib 模块。文件可以
以自动模式打开
不同的翻译
统一码格式。模块“unicode”
包含转换函数
外部和内部 unicode 之间
格式和 re 模块支持
对于 unicode 数据。还有
用于指定字符串的语言语法
和字符数据超出
ISO-latin-1 范围。

我不喜欢宣布什么是最佳实践,但我经常发现有一个最小的、完整的示例来开始概括是很有帮助的。这是将 utf 放入 erlang 应用程序并将其再次发送到不同上下文的示例之一。假设您有一个 MySql 数据库,其表中的行字段包含 utf8 字符,则有一种方法可以将其取出并将其作为 json 通过管道传输到 Web 浏览器:

hg clone http://bitbucket.org/justin/webmachine/ webmachine-read-only
cd webmachine-read-only
make
./scripts/new_webmachine.erl mywebdemo /tmp
svn checkout http://erlang-mysql-driver.googlecode.com/svn/trunk/ erlang-mysql-driver-read-only
cd erlang-mysql-driver-read-only/src
cp * /tmp/mywebdemo/src
svn checkout http://mochiweb.googlecode.com/svn/trunk/ mochiweb-read-only
cp mochiweb-read-only/src/mochijson2.erl /tmp/mywebdemo/src
cd /tmp/mywebdemo

编辑 src/mywebdemo_resource.erl,使其看起来像这样:

-module(mywebdemo_resource).
-export([init/1, to_html/2]). 

-include_lib("webmachine/include/webmachine.hrl").

init([]) -> {ok, undefined}.

to_html(ReqData, State) ->
    mysql:start_link(pool_id, "database.host.com", 3306, "db_user", "db_password", "db_name", fun(A, B, C, D) -> ouch end, utf8), %% add your connection string info
    {data, Res} = mysql:fetch(pool_id, "select * from table where IdWhatever = 13"),
    [[_, Utf8Str, _]] = mysql:get_result_rows(Res), %% pattern will need to be altered to match your table structure
    {mochijson2:encode({struct, [{Utf8Str, 100}]}), ReqData, State}.

构建所有内容并开始url 调度程序:

make
./start.sh

然后在网页中执行以下命令(或者更方便的东西,例如 MozRepl):

var req = new XMLHttpRequest;
req.open('GET', "http://localhost:8000", false);
req.send(null);
eval("(" + req.responseText + ")");

This page:

http://erlang.org/doc/highlights.html

...lists hightlights of release 5.7/OTP R13A. Note this passage:

1.2 Unicode support

Support for Unicode is implemented as
described in EEP10. Formatting and
reading of unicode data both from
terminals and files is supported by
the io and io_lib modules. Files can
be opened in modes with automatic
translation to and from different
unicode formats. The module 'unicode'
contains functions for conversion
between external and internal unicode
formats and the re module has support
for unicode data. There is also
language syntax for specifying string
and character data beyond the
ISO-latin-1 range.

I don't like to make pronouncements on what best practices would be, but I often find it helpful to have a minimal, complete example to start to generalize from. Here's one of getting utf into an erlang application and sending it out again to a different context. Assuming you had a MySql database with a row field in a table containing utf8 characters, here's one way to get it out and pipe it to a web browser as json:

hg clone http://bitbucket.org/justin/webmachine/ webmachine-read-only
cd webmachine-read-only
make
./scripts/new_webmachine.erl mywebdemo /tmp
svn checkout http://erlang-mysql-driver.googlecode.com/svn/trunk/ erlang-mysql-driver-read-only
cd erlang-mysql-driver-read-only/src
cp * /tmp/mywebdemo/src
svn checkout http://mochiweb.googlecode.com/svn/trunk/ mochiweb-read-only
cp mochiweb-read-only/src/mochijson2.erl /tmp/mywebdemo/src
cd /tmp/mywebdemo

Edit src/mywebdemo_resource.erl so it looks like this:

-module(mywebdemo_resource).
-export([init/1, to_html/2]). 

-include_lib("webmachine/include/webmachine.hrl").

init([]) -> {ok, undefined}.

to_html(ReqData, State) ->
    mysql:start_link(pool_id, "database.host.com", 3306, "db_user", "db_password", "db_name", fun(A, B, C, D) -> ouch end, utf8), %% add your connection string info
    {data, Res} = mysql:fetch(pool_id, "select * from table where IdWhatever = 13"),
    [[_, Utf8Str, _]] = mysql:get_result_rows(Res), %% pattern will need to be altered to match your table structure
    {mochijson2:encode({struct, [{Utf8Str, 100}]}), ReqData, State}.

Build everything and start the url dispatcher:

make
./start.sh

Then execute the following in a web page (or something more convenient, like MozRepl):

var req = new XMLHttpRequest;
req.open('GET', "http://localhost:8000", false);
req.send(null);
eval("(" + req.responseText + ")");
汐鸠 2024-08-10 15:02:10

正如之前的海报提到的,erlang 的最新版本原生支持 utf。如果您不能使用最新版本,那么我通常做的一件事就是使用二进制文件来存储字符串数据。它可以防止 erlang 破坏列表中的字节。它的副作用是使字符串列表也更容易处理。

As the previous poster mentioned the latest release of erlang supports utf natively. If you can't use the latest though then one thing I do usually is to use binaries for string data. It keeps erlang from mangling the bytes in a list. It has the side effect of making lists of strings easier to handle as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文