读取CGI POST数据最有效的方式

发布于 2024-09-09 07:47:50 字数 2636 浏览 1 评论 0原文

我非常需要一种方法来挖掘潜在的大量 CGI 提供的 POST 数据。

读取 GET 数据没什么大不了的,因为我可以根据需要多次重新请求 QUERY_STRING 环境变量,但使用通过 stdin 提供的 POST 数据。我只能读一次并且必须将其存储在某个地方。

我当前的方法包括读取临时文件中的整堆 POST 数据,当程序退出时该文件将被删除,并扫描它以找到我想要查找的键。 在 GET 解析方法中,我可以对 QUERY_STRING 执行 strtok(),因为 GET 数据的限制非常低,因此可以安全地在 RAM 中获取数据,但 POST 数据可以是从空到“name=Bob”再到 4 GB 电影的任何内容文件。

所以,这是我当前的方法:

int get_post_data(const char *s_key, char *target, size_t target_size)
{
   FILE *tmp;
   int ret_val = -1;

   /* postdata_temp = global variable containing the temporary file name */
   if ((tmp = fopen(postdata_tempfile, "r")) == NULL)
      return -1;
   else
   {
      char *buffer = NULL;
      char *temp_buffer = NULL;
      int buffer_size;
      int i;

      if ((buffer = malloc(BUFFER_SIZE)) == NULL)
         return -1;

      memset(buffer, 0, sizeof(BUFFER_SIZE));
      buffer_size = BUFFER_SIZE;

      for (i = 0;; i++)
      {
         int c = fgetc(tmp);

         if ((c == '&') || feof(tmp))
         {
            char *key = strtok(buffer, "=");
            char *val = strtok(NULL, "");            

            if (key)
            {
               if (strcmp(s_key, key) == 0)
               {
                  if (val)
                  {
                     strncpy(target, val, target_size);
                     ret_val = strlen(val);
                  }
                  else
                  {
                     target = NULL;
                     ret_val = 0;
                  }

                  break;
               }
            }

            if (feof(tmp))
               break;

            memset(buffer, 0, buffer_size);
            i = -1; /* because it will be 0 when the fgetc() is called the 
                     * next time */
         }
         else
         {
            if (!(i < buffer_size))
            {
               buffer_size += BUFFER_SIZE;

               if ((temp_buffer = realloc(buffer, buffer_size)) == NULL)
               {
                  free(temp_buffer);
                  free(buffer);
                  target = NULL;

                  return -1;
               }
               else
                  buffer = temp_buffer;
            }

            buffer[i] = c;
         }

      }

      free(buffer);

      // printf("Final buffer size: %d<br />\n", buffer_size);
   }

   fclose(tmp);

   return ret_val;
}

这确实有效,我可以调用 get_post_data("user_password", pass, sizeof(pass));,检查返回值 (<0 = error, =0 = key存在但value为NULL,>0 =数据长度),但看起来太肥胖了。我的意思是.. 对于我想要搜索的每个 POST 参数来说,巨大的 IO 开销只是为了不在我的 RAM 中包含整个字符串来上传潜在的大文件?

Stackoverflow 是怎么想的?

I'm in great need of a way to dig through potentially huge amounts of CGI supplied POST data.

With reading the GET data it's no big deal, as I can just re-request the QUERY_STRING environment variable as often as I want, but with POST data which is supplied via stdin. I can only read it in once and have to store it somewhere.

My current method consists of reading the whole bunch of POST data inside a temporary file which will be removed when the program exits and scan through it to find the keys I want to fin.
In the GET parsing approach I could just strtok() over the QUERY_STRING because GET data has pretty low limits so it's safe to be fetched inside RAM, but the POST data can be anything from empty to "name=Bob" to a 4 Gigabye movie file.

So, here's my current approach:

int get_post_data(const char *s_key, char *target, size_t target_size)
{
   FILE *tmp;
   int ret_val = -1;

   /* postdata_temp = global variable containing the temporary file name */
   if ((tmp = fopen(postdata_tempfile, "r")) == NULL)
      return -1;
   else
   {
      char *buffer = NULL;
      char *temp_buffer = NULL;
      int buffer_size;
      int i;

      if ((buffer = malloc(BUFFER_SIZE)) == NULL)
         return -1;

      memset(buffer, 0, sizeof(BUFFER_SIZE));
      buffer_size = BUFFER_SIZE;

      for (i = 0;; i++)
      {
         int c = fgetc(tmp);

         if ((c == '&') || feof(tmp))
         {
            char *key = strtok(buffer, "=");
            char *val = strtok(NULL, "");            

            if (key)
            {
               if (strcmp(s_key, key) == 0)
               {
                  if (val)
                  {
                     strncpy(target, val, target_size);
                     ret_val = strlen(val);
                  }
                  else
                  {
                     target = NULL;
                     ret_val = 0;
                  }

                  break;
               }
            }

            if (feof(tmp))
               break;

            memset(buffer, 0, buffer_size);
            i = -1; /* because it will be 0 when the fgetc() is called the 
                     * next time */
         }
         else
         {
            if (!(i < buffer_size))
            {
               buffer_size += BUFFER_SIZE;

               if ((temp_buffer = realloc(buffer, buffer_size)) == NULL)
               {
                  free(temp_buffer);
                  free(buffer);
                  target = NULL;

                  return -1;
               }
               else
                  buffer = temp_buffer;
            }

            buffer[i] = c;
         }

      }

      free(buffer);

      // printf("Final buffer size: %d<br />\n", buffer_size);
   }

   fclose(tmp);

   return ret_val;
}

This does work, I can call get_post_data("user_password", pass, sizeof(pass));, check for the return value (<0 = error, =0 = key exists but value is NULL, >0 = data length), but it seems too obese. I mean.. huge IO overhead for every single POST parameter I want to search just to not have the whole string inside my RAM for potentially large files being uploaded?

What does Stackoverflow think?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

对风讲故事 2024-09-16 07:47:50

如果您想避免将大文件加载到 RAM 中,您可以使用内存映射文件 - 不可移植,但这是正确的方法。如果您的平台是 POSIX,您可以使用 mmap() 来实现此目的。

顺便说一句,我没有完全阅读或测试您的代码,但我想知道使用 strtok() 是否正确,因为它会破坏数据。如果您的数据可能是二进制文件,我还想知道如何使用 str...() 函数,但我不知道 CGI 部分如何工作,所以您可能就在那里。

If you want to avoid loading a big file into RAM, you could use a memory mapped file - not portable, but it's the right way to do it. If your platform is POSIX you could use mmap() for this.

By the way I didn't fully read or test your code but I would wonder whether using strtok() is the right thing to do because it destroys the data as it goes. I'd also wonder about using str...() functions if your data is maybe a binary file, but I don't know how the CGI part works so you might be right there.

瞳孔里扚悲伤 2024-09-16 07:47:50

我认为拒绝大于设定限制(例如 2MB)的 POST 请求会更容易。

这样:

  • 您就有了一个可管理大小的数据块可供使用。
  • 您可以阻止恶意 4GB POST 请求。

I think it would be easier to just reject POST requests larger than a set limit, say 2MB.

That way:

  • You have a manageable-sized block of data to work with.
  • You prevent malicious 4GB POST requests.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文