如何在hive中将ansi转换为utf8

发布于 2024-12-25 05:37:42 字数 505 浏览 1 评论 0原文

我想在配置单元中使用自定义输入格式,我在这里找到代码: https://github.com/msukmanowsky/OmnitureDataFileInputFormat 但是当我完成测试代码时,我发现我要在hive中解析的ftp日志文件是由“ANSI”(实际上是“GBK”)编码的,因此结果无法在java控制台中正常显示。

请问您能帮我看看如何转换代码才能正常显示吗,谢谢。 您可以在 OmnitureDataFileInputFormat 中制作一个示例。代码在地址中: https://github.com/msukmanowsky/OmnitureDataFileInputFormat

多谢!

I want to use custom inputformat in hive, i find the code here:
https://github.com/msukmanowsky/OmnitureDataFileInputFormat
but when i finished the test code i find that the ftp log files which i want to parse in hive is encoding by "ANSI"("GBK" actually), so the result can't be displayed normally in java console.

So could you help me how to convert the code to make sure the display normally, thanks.
You can make an example in OmnitureDataFileInputFormat. The code is in the address:
https://github.com/msukmanowsky/OmnitureDataFileInputFormat.

Thanks a lot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小草泠泠 2025-01-01 05:37:42

以下通用 UDF 可用于将 GBK 字符集的字段转换为 UTF-8。
在对该字段进行任何操作之前,应使用此 UDF。

public class GUDFTestGBK extends GenericUDF{

private StringObjectInspector oi;

@Override
public ObjectInspector initialize(ObjectInspector [] arguments) throws  UDFArgumentException {
    if (arguments.length != 1) {
        throw new UDFArgumentLengthException(
            "The function GUDFTestGBK(s) takes exactly 1 arguments.");
    }

    converter = ObjectInspectorConverters.getConverter(arguments[0],
        PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    oi = (StringObjectInspector)arguments[0];

    return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
}


@Override
public Object evaluate(DeferredObject [] arguments) throws HiveException{

    try{
        Text str = oi.getPrimitiveWritableObject(arguments[0].get());
        byte [] bytes = str.getBytes();
        String s = new String(bytes, "GBK");
        Text new_str = new Text(s.getBytes("UTF-8"));
        return new_str;
    } catch (Exception e){
        return new Text("Charset conversion failed.");
    }
}

@Override
public String getDisplayString(String[] children){
    return "GBKToUTF8( " + children[0] + " )";
}
}

The following Generic UDF can be used to convert a field with GBK charset to UTF-8.
This UDF should be used before you have any operation on this field.

public class GUDFTestGBK extends GenericUDF{

private StringObjectInspector oi;

@Override
public ObjectInspector initialize(ObjectInspector [] arguments) throws  UDFArgumentException {
    if (arguments.length != 1) {
        throw new UDFArgumentLengthException(
            "The function GUDFTestGBK(s) takes exactly 1 arguments.");
    }

    converter = ObjectInspectorConverters.getConverter(arguments[0],
        PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    oi = (StringObjectInspector)arguments[0];

    return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
}


@Override
public Object evaluate(DeferredObject [] arguments) throws HiveException{

    try{
        Text str = oi.getPrimitiveWritableObject(arguments[0].get());
        byte [] bytes = str.getBytes();
        String s = new String(bytes, "GBK");
        Text new_str = new Text(s.getBytes("UTF-8"));
        return new_str;
    } catch (Exception e){
        return new Text("Charset conversion failed.");
    }
}

@Override
public String getDisplayString(String[] children){
    return "GBKToUTF8( " + children[0] + " )";
}
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文