正则表达式解析图像数据URI

发布于 2024-11-02 08:57:14 字数 553 浏览 3 评论 0原文

如果我有:

<img src="data:image/gif;base64,R0lGODlhtwBEANUAAMbIypOVmO7v76yusOHi49AsSDY1N2NkZvvs6VVWWPDAutZOWJ+hpPPPyeqmoNlcYXBxdNTV1nx+gN51c4iJjEdHSfbc19M+UOeZk7m7veSMiNtpauGBfu2zrc4RQSMfIP///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAC3AEQAAAb/QJBwSCwaj8ikcslsOp/QqHRKrVqv2Kx2y+16v+CweEwum8/otHrNbrvf8Lh8Tq/b7/i8fs" />

如何将数据部分解析为:

  • Mime 类型(image/gif)
  • 编码(base64)
  • 图像数据(二进制数据)

If I have :

<img src="data:image/gif;base64,R0lGODlhtwBEANUAAMbIypOVmO7v76yusOHi49AsSDY1N2NkZvvs6VVWWPDAutZOWJ+hpPPPyeqmoNlcYXBxdNTV1nx+gN51c4iJjEdHSfbc19M+UOeZk7m7veSMiNtpauGBfu2zrc4RQSMfIP///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAC3AEQAAAb/QJBwSCwaj8ikcslsOp/QqHRKrVqv2Kx2y+16v+CweEwum8/otHrNbrvf8Lh8Tq/b7/i8fs" />

How can I parse the data part into:

  • Mime type (image/gif)
  • Encoding (base64)
  • Image data (the binary data)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

逆光下的微笑 2024-11-09 08:57:14

编辑:扩展以显示用法

var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);

var match = regex.Match(input);

var mime = match.Groups["mime"].Value;
var encoding = match.Groups["encoding"].Value;
var data = match.Groups["data"].Value;

注意:正则表达式适用于有问题的输入。如果还指定了charset,它将不起作用并且必须重写。

EDIT: expanded to show usage

var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);

var match = regex.Match(input);

var mime = match.Groups["mime"].Value;
var encoding = match.Groups["encoding"].Value;
var data = match.Groups["data"].Value;

NOTE: The regex applies to the input shown in question. If there was a charset specified too, it would not work and would have to be rewritten.

南烟 2024-11-09 08:57:14

实际上,您不需要正则表达式。根据 Wikipedia,数据 URI 格式

data:[<MIME-type>][;charset=<encoding>][;base64],<data>

只需执行以下操作:

byte[] imagedata = Convert.FromBase64String(imageSrc.Substring(imageSrc.IndexOf(",") + 1));

Actually, you don't need a regex for that. According to Wikipedia, the data URI format is

data:[<MIME-type>][;charset=<encoding>][;base64],<data>

so just do the following:

byte[] imagedata = Convert.FromBase64String(imageSrc.Substring(imageSrc.IndexOf(",") + 1));
荒人说梦 2024-11-09 08:57:14

我还面临着解析数据 URI 方案的需要。因此,我改进了本页上专门针对 C# 给出的正则表达式,它适合任何数据 URI 方案(要检查该方案,您可以从 此处此处

这里是我的 C# 解决方案:

private class DataUriModel {
  public string MediaType { get; set; }
  public string Type { get; set; }
  public string[] Tree { get; set; }
  public string Subtype { get; set; }
  public string Suffix { get; set; }
  public string[] Params { get; set; }
  public string Encoding { get; set; }
  public string Data { get; set; }
}

static void Main(string[] args) {
  string s = "data:image/prs.jpeg+gzip;charset=UTF-8;page=21;page=22;base64,/9j/4AAQSkZJRgABAQAAAQABAAD";
  var parsedUri = GetDataURI(s);
  Console.WriteLine(decodedUri.Type);
  Console.WriteLine(decodedUri.Subtype);
  Console.WriteLine(decodedUri.Encoding);
}

private static DataUriModel GetDataURI(string data) {
  var result = new DataUriModel();
  Regex regex = new Regex(@"^\s*data:(?<media_type>(?<type>[a-z\-]+){1}\/(?<tree>([a-z\-]+\.)+)?(?<subtype>[a-z\-]+){1}(?<suffix>\+[a-z]+)?(?<params>(;[a-z\-]+\=[a-z0-9\-\+]+)*)?)?(?<encoding>;base64)?(?<data>,+[a-z0-9\\\!\$\&\'\,\(\)\*\+\,\;\=\-\.\~\:\@\/\?\%\s]*\s*)?$", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
  var match = regex.Match(data);

  if (!match.Success)
    return result;

  var names = regex.GetGroupNames();
  foreach (var name in names) {
    var group = match.Groups[name];
    switch (name) {
      case "media_type": result.MediaType = group.Value; break;
      case "type": result.Type = group.Value; break;
      case "tree": result.Tree = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[0..^1].Split(".") : null; break;
      case "subtype": result.Subtype = group.Value; break;
      case "suffix": result.Suffix = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "params": result.Params = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..].Split(";") : null; break;
      case "encoding": result.Encoding = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "data": result.Data = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
    }
  }

  return result;
}

I faced also with the need to parse the data URI scheme. As a result, I improved the regular expression given on this page specifically for C# and which fits any data URI scheme (to check the scheme, you can take it from here or here.

Here is my solution for C#:

private class DataUriModel {
  public string MediaType { get; set; }
  public string Type { get; set; }
  public string[] Tree { get; set; }
  public string Subtype { get; set; }
  public string Suffix { get; set; }
  public string[] Params { get; set; }
  public string Encoding { get; set; }
  public string Data { get; set; }
}

static void Main(string[] args) {
  string s = "data:image/prs.jpeg+gzip;charset=UTF-8;page=21;page=22;base64,/9j/4AAQSkZJRgABAQAAAQABAAD";
  var parsedUri = GetDataURI(s);
  Console.WriteLine(decodedUri.Type);
  Console.WriteLine(decodedUri.Subtype);
  Console.WriteLine(decodedUri.Encoding);
}

private static DataUriModel GetDataURI(string data) {
  var result = new DataUriModel();
  Regex regex = new Regex(@"^\s*data:(?<media_type>(?<type>[a-z\-]+){1}\/(?<tree>([a-z\-]+\.)+)?(?<subtype>[a-z\-]+){1}(?<suffix>\+[a-z]+)?(?<params>(;[a-z\-]+\=[a-z0-9\-\+]+)*)?)?(?<encoding>;base64)?(?<data>,+[a-z0-9\\\!\$\&\'\,\(\)\*\+\,\;\=\-\.\~\:\@\/\?\%\s]*\s*)?
quot;, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
  var match = regex.Match(data);

  if (!match.Success)
    return result;

  var names = regex.GetGroupNames();
  foreach (var name in names) {
    var group = match.Groups[name];
    switch (name) {
      case "media_type": result.MediaType = group.Value; break;
      case "type": result.Type = group.Value; break;
      case "tree": result.Tree = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[0..^1].Split(".") : null; break;
      case "subtype": result.Subtype = group.Value; break;
      case "suffix": result.Suffix = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "params": result.Params = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..].Split(";") : null; break;
      case "encoding": result.Encoding = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "data": result.Data = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
    }
  }

  return result;
}
生生漫 2024-11-09 08:57:14

数据 URI 对它们来说有点复杂,它们可以包含参数、媒体类型等……有时您需要知道这些信息,而不仅仅是数据。

要解析数据 URI 并提取所有相关部分,请尝试以下操作:

/**
 * Parse a data uri and return an object with information about the different parts
 * @param {*} data_uri 
 */
function parseDataURI(data_uri) {
    let regex = /^\s*data:(?<media_type>(?<mime_type>[a-z\-]+\/[a-z\-\+]+)(?<params>(;[a-z\-]+\=[a-z\-]+)*))?(?<encoding>;base64)?,(?<data>[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*)$/i;
    let result = regex.exec(data_uri);
    let info = {
        media_type: result.groups.media_type,
        mime_type: result.groups.mime_type,
        params: result.groups.params,
        encoding: result.groups.encoding,
        data: result.groups.data
    }
    if(info.params)
        info.params = Object.fromEntries(info.params.split(';').slice(1).map(param => param.split('=')));
    if(info.encoding)
        info.encoding = info.encoding.replace(';','');
    return info;
}

这将为您提供一个对象,该对象已解析出所有相关位,并且参数为字典 {foo: baz}。

示例(带断言的摩卡测试):

describe("Parse data URI", () => {
    it("Should extract data URI parts correctly",
        async ()=> {
            let uri = 'data:text/vnd-example+xyz;foo=bar;bar=baz;base64,R0lGODdh';
            let info = parseDataURI(uri);
            assert.equal(info.media_type,'text/vnd-example+xyz;foo=bar;bar=baz');
            assert.equal(info.mime_type,'text/vnd-example+xyz');
            assert.equal(info.encoding, 'base64');
            assert.equal(info.data, 'R0lGODdh');
            assert.equal(info.params.foo, 'bar');
            assert.equal(info.params.bar, 'baz');
        }
    );
});

Data URI's have a bit of complexity to them, they can contain params, media type, etc... and sometimes you need to know this info, not just the data.

To parse a data URI and extract all of the relevant parts, try this:

/**
 * Parse a data uri and return an object with information about the different parts
 * @param {*} data_uri 
 */
function parseDataURI(data_uri) {
    let regex = /^\s*data:(?<media_type>(?<mime_type>[a-z\-]+\/[a-z\-\+]+)(?<params>(;[a-z\-]+\=[a-z\-]+)*))?(?<encoding>;base64)?,(?<data>[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*)$/i;
    let result = regex.exec(data_uri);
    let info = {
        media_type: result.groups.media_type,
        mime_type: result.groups.mime_type,
        params: result.groups.params,
        encoding: result.groups.encoding,
        data: result.groups.data
    }
    if(info.params)
        info.params = Object.fromEntries(info.params.split(';').slice(1).map(param => param.split('=')));
    if(info.encoding)
        info.encoding = info.encoding.replace(';','');
    return info;
}

This will give you an object that has all the relevant bits parsed out, and the params as a dictionary {foo: baz}.

Example (mocha test with assert):

describe("Parse data URI", () => {
    it("Should extract data URI parts correctly",
        async ()=> {
            let uri = 'data:text/vnd-example+xyz;foo=bar;bar=baz;base64,R0lGODdh';
            let info = parseDataURI(uri);
            assert.equal(info.media_type,'text/vnd-example+xyz;foo=bar;bar=baz');
            assert.equal(info.mime_type,'text/vnd-example+xyz');
            assert.equal(info.encoding, 'base64');
            assert.equal(info.data, 'R0lGODdh');
            assert.equal(info.params.foo, 'bar');
            assert.equal(info.params.bar, 'baz');
        }
    );
});

拥抱没勇气 2024-11-09 08:57:14

这是我的正则表达式,我还必须分离 mime 类型(图像/jpg)。

^data:(?<mimeType>(?<mime>\w+)\/(?<extension>\w+));(?<encoding>\w+),(?<data>.*)

Here is my regular expression where I had to separate the mime-type (image/jpg) as well.

^data:(?<mimeType>(?<mime>\w+)\/(?<extension>\w+));(?<encoding>\w+),(?<data>.*)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文