Android 上的 SQLite 是否使用支持 FTS 的 ICU 标记生成器构建?
就像标题所说:我们可以使用...USING fts3(tokenizer icu th_TH, ...)
。如果可以的话,有谁知道支持哪些区域设置,以及它是否因平台版本而异?
Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...)
. If we can, does anyone know what locales are suported, and whether it varies by platform version?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,只有 tokenizer=porter
当我指定 tokenizer=icu 时,我得到“android.database.sqlite.SQLiteException:unknown tokenizer:icu”
另外,此链接暗示如果 Android 没有默认编译它,则不会
可用的
http://sqlite.phxsoftware.com/forums/t/2349.aspx
No, only tokenizer=porter
When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"
Also, this link hints that if Android didn't compile it in by default, it will not be
available
http://sqlite.phxsoftware.com/forums/t/2349.aspx
对于 API 级别 21 或更高版本,我测试并发现 ICU 分词器已经可用。
但是,为了支持 90% 以上的设备,可以采取一些解决方法。我有一个解决方法的想法,在我的另一个问题中也提到了: 解决 Android SQLite 亚洲文本全文搜索
您可以将 ICU 分词器函数移植到 java 或本机 Android 模块中,作为单独的模块,但不直接参与 SQLite。然后使用“外部内容表”链接到虚拟表(从FTS4开始支持)。
添加元组时,将普通内容添加到外部内容表,但在添加到虚拟索引表之前调用独立标记器在单词边界添加人工空格。
在进行元组删除时,再次调用tokenzier以人工空格更新内容表,然后删除虚拟表元组,然后删除内容表元组。
这有点棘手,但与重新编译完整 SQLite 的另一种选择相比,它已经省了不少力。
有关外部内容表及其工作原理,请参阅 https://www.sqlite.org/ fts3.html#section_6_2_2
可用的 ICU 分词器实际上位于 Android SDK 中。使用 BreakIterator.getWordInstance。看起来它甚至支持针对中文等语言的基于字典的分词器。
http://developer.android.com/reference/java/text/BreakIterator。 html
For API Level 21 or up, I tested and found that ICU tokenizer is already available.
However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text
You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).
When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.
When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.
This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.
For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2
The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese.
http://developer.android.com/reference/java/text/BreakIterator.html
我有一些在下面的链接中使用标记化的 Android 代码,也许会有一些帮助:
https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java
I have some Android code that uses tokenization in the link below, maybe it will of some help:
https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java