正则表达式解析图像数据URI

发布于 2024-11-02 08:57:14 字数 553 浏览 3 评论 0原文

如果我有：

<img src="data:image/gif;base64,R0lGODlhtwBEANUAAMbIypOVmO7v76yusOHi49AsSDY1N2NkZvvs6VVWWPDAutZOWJ+hpPPPyeqmoNlcYXBxdNTV1nx+gN51c4iJjEdHSfbc19M+UOeZk7m7veSMiNtpauGBfu2zrc4RQSMfIP///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAC3AEQAAAb/QJBwSCwaj8ikcslsOp/QqHRKrVqv2Kx2y+16v+CweEwum8/otHrNbrvf8Lh8Tq/b7/i8fs" />

如何将数据部分解析为：

Mime 类型（image/gif）
编码（base64）
图像数据（二进制数据）

原文

If I have :

<img src="data:image/gif;base64,R0lGODlhtwBEANUAAMbIypOVmO7v76yusOHi49AsSDY1N2NkZvvs6VVWWPDAutZOWJ+hpPPPyeqmoNlcYXBxdNTV1nx+gN51c4iJjEdHSfbc19M+UOeZk7m7veSMiNtpauGBfu2zrc4RQSMfIP///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAC3AEQAAAb/QJBwSCwaj8ikcslsOp/QqHRKrVqv2Kx2y+16v+CweEwum8/otHrNbrvf8Lh8Tq/b7/i8fs" />

How can I parse the data part into:

Mime type (image/gif)
Encoding (base64)
Image data (the binary data)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆光下的微笑 2024-11-09 08:57:14

编辑：扩展以显示用法

var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);

var match = regex.Match(input);

var mime = match.Groups["mime"].Value;
var encoding = match.Groups["encoding"].Value;
var data = match.Groups["data"].Value;

注意：正则表达式适用于有问题的输入。如果还指定了charset，它将不起作用并且必须重写。

EDIT: expanded to show usage

var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);

var match = regex.Match(input);

var mime = match.Groups["mime"].Value;
var encoding = match.Groups["encoding"].Value;
var data = match.Groups["data"].Value;

NOTE: The regex applies to the input shown in question. If there was a charset specified too, it would not work and would have to be rewritten.

回复收藏 0 原文

南烟 2024-11-09 08:57:14

实际上，您不需要正则表达式。根据 Wikipedia，数据 URI 格式

data:[<MIME-type>][;charset=<encoding>][;base64],<data>

只需执行以下操作：

byte[] imagedata = Convert.FromBase64String(imageSrc.Substring(imageSrc.IndexOf(",") + 1));

Actually, you don't need a regex for that. According to Wikipedia, the data URI format is

data:[<MIME-type>][;charset=<encoding>][;base64],<data>

so just do the following:

byte[] imagedata = Convert.FromBase64String(imageSrc.Substring(imageSrc.IndexOf(",") + 1));

回复收藏 0 原文

荒人说梦 2024-11-09 08:57:14

我还面临着解析数据 URI 方案的需要。因此，我改进了本页上专门针对 C# 给出的正则表达式，它适合任何数据 URI 方案（要检查该方案，您可以从此处或此处。

这里是我的 C# 解决方案：

private class DataUriModel {
  public string MediaType { get; set; }
  public string Type { get; set; }
  public string[] Tree { get; set; }
  public string Subtype { get; set; }
  public string Suffix { get; set; }
  public string[] Params { get; set; }
  public string Encoding { get; set; }
  public string Data { get; set; }
}

static void Main(string[] args) {
  string s = "data:image/prs.jpeg+gzip;charset=UTF-8;page=21;page=22;base64,/9j/4AAQSkZJRgABAQAAAQABAAD";
  var parsedUri = GetDataURI(s);
  Console.WriteLine(decodedUri.Type);
  Console.WriteLine(decodedUri.Subtype);
  Console.WriteLine(decodedUri.Encoding);
}

private static DataUriModel GetDataURI(string data) {
  var result = new DataUriModel();
  Regex regex = new Regex(@"^\s*data:(?<media_type>(?<type>[a-z\-]+){1}\/(?<tree>([a-z\-]+\.)+)?(?<subtype>[a-z\-]+){1}(?<suffix>\+[a-z]+)?(?<params>(;[a-z\-]+\=[a-z0-9\-\+]+)*)?)?(?<encoding>;base64)?(?<data>,+[a-z0-9\\\!\$\&\'\,\(\)\*\+\,\;\=\-\.\~\:\@\/\?\%\s]*\s*)?$", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
  var match = regex.Match(data);

  if (!match.Success)
    return result;

  var names = regex.GetGroupNames();
  foreach (var name in names) {
    var group = match.Groups[name];
    switch (name) {
      case "media_type": result.MediaType = group.Value; break;
      case "type": result.Type = group.Value; break;
      case "tree": result.Tree = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[0..^1].Split(".") : null; break;
      case "subtype": result.Subtype = group.Value; break;
      case "suffix": result.Suffix = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "params": result.Params = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..].Split(";") : null; break;
      case "encoding": result.Encoding = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "data": result.Data = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
    }
  }

  return result;
}

I faced also with the need to parse the data URI scheme. As a result, I improved the regular expression given on this page specifically for C# and which fits any data URI scheme (to check the scheme, you can take it from here or here.

Here is my solution for C#:

private class DataUriModel {
  public string MediaType { get; set; }
  public string Type { get; set; }
  public string[] Tree { get; set; }
  public string Subtype { get; set; }
  public string Suffix { get; set; }
  public string[] Params { get; set; }
  public string Encoding { get; set; }
  public string Data { get; set; }
}

static void Main(string[] args) {
  string s = "data:image/prs.jpeg+gzip;charset=UTF-8;page=21;page=22;base64,/9j/4AAQSkZJRgABAQAAAQABAAD";
  var parsedUri = GetDataURI(s);
  Console.WriteLine(decodedUri.Type);
  Console.WriteLine(decodedUri.Subtype);
  Console.WriteLine(decodedUri.Encoding);
}

private static DataUriModel GetDataURI(string data) {
  var result = new DataUriModel();
  Regex regex = new Regex(@"^\s*data:(?<media_type>(?<type>[a-z\-]+){1}\/(?<tree>([a-z\-]+\.)+)?(?<subtype>[a-z\-]+){1}(?<suffix>\+[a-z]+)?(?<params>(;[a-z\-]+\=[a-z0-9\-\+]+)*)?)?(?<encoding>;base64)?(?<data>,+[a-z0-9\\\!\$\&\'\,\(\)\*\+\,\;\=\-\.\~\:\@\/\?\%\s]*\s*)?quot;, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
  var match = regex.Match(data);

  if (!match.Success)
    return result;

  var names = regex.GetGroupNames();
  foreach (var name in names) {
    var group = match.Groups[name];
    switch (name) {
      case "media_type": result.MediaType = group.Value; break;
      case "type": result.Type = group.Value; break;
      case "tree": result.Tree = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[0..^1].Split(".") : null; break;
      case "subtype": result.Subtype = group.Value; break;
      case "suffix": result.Suffix = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "params": result.Params = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..].Split(";") : null; break;
      case "encoding": result.Encoding = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
      case "data": result.Data = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
    }
  }

  return result;
}

回复收藏 0 原文

生生漫 2024-11-09 08:57:14

数据 URI 对它们来说有点复杂，它们可以包含参数、媒体类型等……有时您需要知道这些信息，而不仅仅是数据。

要解析数据 URI 并提取所有相关部分，请尝试以下操作：

/**
 * Parse a data uri and return an object with information about the different parts
 * @param {*} data_uri 
 */
function parseDataURI(data_uri) {
    let regex = /^\s*data:(?<media_type>(?<mime_type>[a-z\-]+\/[a-z\-\+]+)(?<params>(;[a-z\-]+\=[a-z\-]+)*))?(?<encoding>;base64)?,(?<data>[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*)$/i;
    let result = regex.exec(data_uri);
    let info = {
        media_type: result.groups.media_type,
        mime_type: result.groups.mime_type,
        params: result.groups.params,
        encoding: result.groups.encoding,
        data: result.groups.data
    }
    if(info.params)
        info.params = Object.fromEntries(info.params.split(';').slice(1).map(param => param.split('=')));
    if(info.encoding)
        info.encoding = info.encoding.replace(';','');
    return info;
}

这将为您提供一个对象，该对象已解析出所有相关位，并且参数为字典 {foo: baz}。

示例（带断言的摩卡测试）：

describe("Parse data URI", () => {
    it("Should extract data URI parts correctly",
        async ()=> {
            let uri = 'data:text/vnd-example+xyz;foo=bar;bar=baz;base64,R0lGODdh';
            let info = parseDataURI(uri);
            assert.equal(info.media_type,'text/vnd-example+xyz;foo=bar;bar=baz');
            assert.equal(info.mime_type,'text/vnd-example+xyz');
            assert.equal(info.encoding, 'base64');
            assert.equal(info.data, 'R0lGODdh');
            assert.equal(info.params.foo, 'bar');
            assert.equal(info.params.bar, 'baz');
        }
    );
});

Data URI's have a bit of complexity to them, they can contain params, media type, etc... and sometimes you need to know this info, not just the data.

To parse a data URI and extract all of the relevant parts, try this:

/**
 * Parse a data uri and return an object with information about the different parts
 * @param {*} data_uri 
 */
function parseDataURI(data_uri) {
    let regex = /^\s*data:(?<media_type>(?<mime_type>[a-z\-]+\/[a-z\-\+]+)(?<params>(;[a-z\-]+\=[a-z\-]+)*))?(?<encoding>;base64)?,(?<data>[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*)$/i;
    let result = regex.exec(data_uri);
    let info = {
        media_type: result.groups.media_type,
        mime_type: result.groups.mime_type,
        params: result.groups.params,
        encoding: result.groups.encoding,
        data: result.groups.data
    }
    if(info.params)
        info.params = Object.fromEntries(info.params.split(';').slice(1).map(param => param.split('=')));
    if(info.encoding)
        info.encoding = info.encoding.replace(';','');
    return info;
}

This will give you an object that has all the relevant bits parsed out, and the params as a dictionary {foo: baz}.

Example (mocha test with assert):

describe("Parse data URI", () => {
    it("Should extract data URI parts correctly",
        async ()=> {
            let uri = 'data:text/vnd-example+xyz;foo=bar;bar=baz;base64,R0lGODdh';
            let info = parseDataURI(uri);
            assert.equal(info.media_type,'text/vnd-example+xyz;foo=bar;bar=baz');
            assert.equal(info.mime_type,'text/vnd-example+xyz');
            assert.equal(info.encoding, 'base64');
            assert.equal(info.data, 'R0lGODdh');
            assert.equal(info.params.foo, 'bar');
            assert.equal(info.params.bar, 'baz');
        }
    );
});

回复收藏 0 原文

拥抱没勇气 2024-11-09 08:57:14

这是我的正则表达式，我还必须分离 mime 类型（图像/jpg）。

^data:(?<mimeType>(?<mime>\w+)\/(?<extension>\w+));(?<encoding>\w+),(?<data>.*)

Here is my regular expression where I had to separate the mime-type (image/jpg) as well.

^data:(?<mimeType>(?<mime>\w+)\/(?<extension>\w+));(?<encoding>\w+),(?<data>.*)

回复收藏 0 原文

~没有更多了~

关于作者

浪推晚风

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

正则表达式解析图像数据URI

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

正则表达式解析图像数据URI

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。