如何使用基于正则表达式的路由处理 404?

发布于 2024-09-09 11:22:51 字数 2245 浏览 6 评论 0原文

请考虑以下非常基本的“控制器”(为简单起见,在本例中为函数):

function Index() {
    var_dump(__FUNCTION__); // show the "Index" page
}

function Send($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Receive($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Not_Found() {
    var_dump(__FUNCTION__); // show a "404 - Not Found" page
}

以及以下基于正则表达式的Route()函数

function Route($route, $function = null)
{
    $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');

    if (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route), '/') . '$~i', $result, $matches) > 0)
    {
        exit(call_user_func_array($function, array_slice($matches, 1)));
    }

    return false;
}

现在我想映射将以下 URL(忽略尾部斜杠)发送到相应的“控制器”:

/index.php -> Index()
/index.php/send/:NUM -> Send()
/index.php/receive/:NUM -> Receive()
/index.php/NON_EXISTENT -> Not_Found()

这是事情开始变得棘手的部分,我有两个无法解决的问题......我想我不是第一个人遇到这个问题,所以应该有人有解决方案。


捕获 404(已解决!)

我找不到一种方法来区分对根目录的请求 (index.php) 和不应该存在的请求,例如 (index.php/notHere< /代码>)。我最终为 URL 提供默认的 index.php 路由,否则应该提供 404 - Not Found 错误页面。 我该如何解决这个问题?

编辑 - 解决方案刚刚闪现在我的脑海中:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found'); // use :any here, see the problem bellow
Route('/', 'Index');

路线的排序

如果我按照“逻辑”顺序设置路线,如下所示:

Route('/', 'Index');
Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route(':any', 'Not_Found');

所有 URL 请求都会被 Index() 控制器捕获,因为空的正则表达式(记住:尾随斜杠将被忽略)匹配所有内容。但是,如果我以“hacky”顺序定义路由,如下所示:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found');
Route('/', 'Index');

一切似乎都按其应有的方式工作。 有解决这个问题的优雅方法吗?

路线可能并不总是硬编码的(从数据库或其他东西中提取),我需要确保它不会忽略任何路线由于它们的定义顺序。 感谢任何帮助!

Please consider the following very rudimentary "controllers" (functions in this case, for simplicity):

function Index() {
    var_dump(__FUNCTION__); // show the "Index" page
}

function Send($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Receive($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Not_Found() {
    var_dump(__FUNCTION__); // show a "404 - Not Found" page
}

And the following regex-based Route() function:

function Route($route, $function = null)
{
    $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');

    if (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route), '/') . '$~i', $result, $matches) > 0)
    {
        exit(call_user_func_array($function, array_slice($matches, 1)));
    }

    return false;
}

Now I want to map the following URLs (trailing slashes are ignored) to the corresponding "controllers":

/index.php -> Index()
/index.php/send/:NUM -> Send()
/index.php/receive/:NUM -> Receive()
/index.php/NON_EXISTENT -> Not_Found()

This is the part where things start to get tricky, I've two problems I'm not able to solve... I figure I'm not the first person to have this problem, so someone out there should have the solution.


Catching 404's (Solved!)

I can't find a way to distinguish between requests to the root (index.php) and requests that shouldn't exist like (index.php/notHere). I end up serving the default index.php route for URLs that should otherwise be served a 404 - Not Found error page. How can I solve this?

EDIT - The solution just flashed in my mind:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found'); // use :any here, see the problem bellow
Route('/', 'Index');

Ordering of the Routes

If I set up the routes in a "logical" order, like this:

Route('/', 'Index');
Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route(':any', 'Not_Found');

All URL requests are catched by the Index() controller, since the empty regex (remember: trailing slashes are ignored) matches everything. However, if I define the routes in a "hacky" order, like this:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found');
Route('/', 'Index');

Everything seems to work like it should. Is there an elegant way of solving this problem?

The routes may not always be hard-coded (pulled from a DB or something), and I need to make sure that it won't be ignoring any routes due to the order they were defined. Any help is appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

娜些时光,永不杰束 2024-09-16 11:22:51

好吧,我知道给猫剥皮的方法不止一种,但你到底为什么要这样做呢?似乎有一些 RoR 方法可以用 mod_rewrite 轻松处理,

话虽这么说,我重写了你的 Route 函数,并且能够实现你的目标。请记住,当您删除所有 / 时,我添加了另一个条件来直接捕获索引,这就是当您希望它与 404 匹配时它与索引匹配的原因。我还合并了 4 个 Route() 调用以使用foreach()。

function Route()
{
        $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');
        $matches = array();

        $routes = array(
                'Send'      => '/send/(:num)',
                'Receive'   => '/receive/(:num)',
                'Index'     => '/',
                'Not_Found' => null
        );

        foreach ($routes as $function => $route)
        {
                if (($route == '/' && $result == '')
                        || (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route)) . '$~i', $result, $matches) > 0))
                {
                        exit(call_user_func_array($function, array_slice($matches, 1)));
                }
        }

        return false;
}

Route();

干杯!

Okay, I know there's more than one way to skin a cat, but why in the world would you do it this way? Seems like some RoR approach to something that could be easily handled with mod_rewrite

That being said, I rewrote your Route function and was able to accomplish your goal. Keep in mind I added another conditional to catch the Index directly as you were stripping out all the /'s and that's why it was matching the Index when you wanted it to match the 404. I also consolidated the 4 Route() calls to use a foreach().

function Route()
{
        $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');
        $matches = array();

        $routes = array(
                'Send'      => '/send/(:num)',
                'Receive'   => '/receive/(:num)',
                'Index'     => '/',
                'Not_Found' => null
        );

        foreach ($routes as $function => $route)
        {
                if (($route == '/' && $result == '')
                        || (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route)) . '$~i', $result, $matches) > 0))
                {
                        exit(call_user_func_array($function, array_slice($matches, 1)));
                }
        }

        return false;
}

Route();

Cheers!

看春风乍起 2024-09-16 11:22:51

这是 MVC Web 应用程序的常见问题,通常在成为问题之前就得到解决。

最简单、最通用的方法是使用异常。如果没有给定参数的内容,则引发 PageNotFound 异常。在应用程序的顶层,捕获所有异常,如以下简化示例所示:

index.php:

try {
    $controller->method($arg);
} catch (PageNotFound $e) {
    show404Page($e->getMessage());
} catch (Exception $e) {
    logFatalError($e->getMessage());
    show500Page();
}

controller.php:

function method($arg) {
    $obj = findByID($arg);
    if (false === $obj) {
         throw new PageNotFound($arg);
    } else {
         ...
    }
}

可以通过对正则表达式进行排序来解决排序问题,以便首先匹配最具体的正则表达式,最不具体的是最后匹配。为此,请计算正则表达式中的路径分隔符(即斜杠),不包括开头的路径分隔符。您将得到以下结果:

 Regex           Separators
 --------------------------
 /send/(:num)    1
 /send/8/(:num)  2
 /               0

按降序对它们进行排序,然后进行处理。处理顺序为:

  1. /send/8/(:num)
  2. /send/(:num)
  3. /

This is a common problem with MVC webapps, that is often solved before it becomes a problem at all.

The easiest and most general way is to use exceptions. Throw a PageNotFound exception if you don't have a content for given parameters. At the top level off your application, catch all exceptions like in this simplified example:

index.php:

try {
    $controller->method($arg);
} catch (PageNotFound $e) {
    show404Page($e->getMessage());
} catch (Exception $e) {
    logFatalError($e->getMessage());
    show500Page();
}

controller.php:

function method($arg) {
    $obj = findByID($arg);
    if (false === $obj) {
         throw new PageNotFound($arg);
    } else {
         ...
    }
}

The ordering problem can be solved by sorting the regexes so that the most specific regex is matched first, and the least specific is matched last. To do this, count the path separtors (ie. slashes) in the regex, excluding the path separator at the beginning. You'll get this:

 Regex           Separators
 --------------------------
 /send/(:num)    1
 /send/8/(:num)  2
 /               0

Sort them by descending order, and process. The process order is:

  1. /send/8/(:num)
  2. /send/(:num)
  3. /
沫离伤花 2024-09-16 11:22:51

好的,首先,类似:

foo.com/index.php/more/info/to/follow 

是完全有效的,并且根据标准应该加载index.php,并将 $_SERVER[PATH_INFO] 设置为 /more/info/to/follow。这是CGI/1.1 标准。如果您希望服务器不执行 PATH_INFO 扩展,请在服务器设置中将其关闭。在 apache 下,它是使用以下方法完成的:

AcceptPathInfo Off

如果您在 Apache2 下将其设置为 Off ...它将发出 404。

我不确定 IIS 标志是什么,但我认为您可以找到它。

OK first of all something like:

foo.com/index.php/more/info/to/follow 

is perfectly valid and as per standard should load up index.php with $_SERVER[PATH_INFO] set to /more/info/to/follow. This is CGI/1.1 standard. If you want the server to NOT perform PATH_INFO expansions then turn it off in your server settings. Under apache it is done using:

AcceptPathInfo Off

If you set it to Off under Apache2 ... It will send out a 404.

I am not sure what the IIS flag is but I think you can find it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文