Dijkstra算法优化/缓存
我有以下 Dijkstra 算法,有 3 个输入变量(开始、停止和时间)。大约需要0.5-1秒才能完成。我的托管提供商说它使用了太多资源,我应该实施一些缓存机制。我的问题是,如何?
因为我有 3 个变量,如果只有其中一个发生变化 - 整个结果就会不同(因为我有一些随时间变化的附加语句,没关系)。那么如何实现一些缓存机制或者做一些优化呢?
我有 1700 个节点。
<?php require_once("../includes/db_connection.php"); ?>
<?php require("../includes/functions.php"); ?>
<?php require("../includes/global_variables.php"); ?>
<?php
// Function to put "maxValues" into time (in my case 10 000 because I know it can't take longer than that from source to end node)
function array_push_key(&$array, $key, $value) {
$array[$key] = $value;
}
// Start the counter
$timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $start = $timeM;
// Get provided values
$startStop = $_GET["start"];
$endStop = $_GET["end"];
$startTime = $_GET["time"];
// Initialize arrays
$time = array();
$previousNode = array();
$allStops = array();
// [5] = 119 --> You can get to stop no. 5 by line no. 119
// line to source node is 0
$lineToThisStop = array();
$lineToThisStop[$startStop] = 0;
// Populate arrays
$result=mysql_query("SELECT stop_id FROM db_stops", $connection);
potvrdi_unos($result);
$counter = 0;
while($rows = mysql_fetch_array($result)){
array_push_key($time, $rows["stop_id"], 10000);
array_push($allStops, $rows["stop_id"]);
// Locate startStop in the allStops array to unset it few lines later
if ($rows["id"] == $startStop) {
$poz = $brojac;
}
$counter++;
}
// Set starting time to starting stop
$time[$startStop] = $startTime;
// Set it activeNode
$activeNode = $startStop;
// Unset it in allStops array (so it doens't have to be checked later)
unset($allStops[$poz]);
$allStops = array_values($allStops);
// I can put "while (true)" because all nodes are connected in "one piece", there is NO UNCONNECTED nodes
while (true) {
$result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);
potvrdi_unos($result);
while($rows = mysql_fetch_array($result)) {
// Draw paths from active node to all other (connected) stops
$nextStopArray = $rows["next_stop"];
// nextStopArray is something like "0,34,123,3123,213" - those are all stops from current, active node/stop
$nextStopArray = explode(",", $nextStopArray);
// sometimes it's just "4" to convert it into array
if (!is_array($nextStopArray)) {
$nextStopArray[0] = $nextStopArray;
}
for ($p = 0; $p < sizeof($nextStopArray); $p++) {
$nextStop = $nextStopArray[$p];
$walkToTheStop = false;
// Few checks
if ($p == 0) {
if ($nextStop != 0) {
$pathDuration = 2;
if ($lineToThisStop[$activeNode] != $rows["route_id"]) {
$pathDuration = $pathDuration * 2;
}
}
} else {
$walkToTheStop = true;
$pathDuration = 1;
}
// If that's shortest path from ActiveNode to nextStop insert it into it's time array (time to get to that stop)
if (($pathDuration + $time[$activeNode]) < $time[$nextStop]) {
$time[$nextStop] = $pathDuration + $time[$activeNode];
array_push_key($previousNode, $nextStop, $activeNode);
// Some aditional records (5000 means "you must walk to that stop")
if ($walkToTheStop) {
$lineToThisStop[$nextStop] = 5000;
} else {
$lineToThisStop[$nextStop] = $rows["route_id"];
}
}
}
}
// Traži slijedeću stanicu (vrh) s najmanjom vrijednosti
$lowestValue = 10000 + 1;
for ($j = 0; $j < sizeof($allStops); $j++) {
if ($time[$allStops[$j]] < $lowestValue) {
$lowestValue = $time[$allStops[$j]];
$activeNode = $allStops[$j];
// Record it's position so I can unset it later
$activeNodePosition = $j;
}
}
// Unset the active node from the array - so loop before will be shorter every time for one node/stop
unset($allStops[$activeNodePosition]);
$allStops = array_values($allStops);
// If you get to the end stop, feel free to break out of the loop
if ($activeNode == $endStop) {
break;
}
}
// How long did it take?
$timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $finish = $timeM;
$total_time = round(($finish - $start), 4);
echo 'Total time '.$total_time.' seconds.'."<br />";
?>
<?php require_once("../includes/close_connection.php"); ?>
I have the following Dijkstra algorithm with 3 input variables (start, stop and time). It takes about 0.5-1s to complete. My hosting provider says it's using too much resources and I should implement some caching mechanism. My question is, how?
Because I have 3 variables, if only one of them changes - the whole result is different (because I have some additional statements with time, nevermind). So how to implement some caching mechanism or do some optimisation?
I have 1700 nodes.
<?php require_once("../includes/db_connection.php"); ?>
<?php require("../includes/functions.php"); ?>
<?php require("../includes/global_variables.php"); ?>
<?php
// Function to put "maxValues" into time (in my case 10 000 because I know it can't take longer than that from source to end node)
function array_push_key(&$array, $key, $value) {
$array[$key] = $value;
}
// Start the counter
$timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $start = $timeM;
// Get provided values
$startStop = $_GET["start"];
$endStop = $_GET["end"];
$startTime = $_GET["time"];
// Initialize arrays
$time = array();
$previousNode = array();
$allStops = array();
// [5] = 119 --> You can get to stop no. 5 by line no. 119
// line to source node is 0
$lineToThisStop = array();
$lineToThisStop[$startStop] = 0;
// Populate arrays
$result=mysql_query("SELECT stop_id FROM db_stops", $connection);
potvrdi_unos($result);
$counter = 0;
while($rows = mysql_fetch_array($result)){
array_push_key($time, $rows["stop_id"], 10000);
array_push($allStops, $rows["stop_id"]);
// Locate startStop in the allStops array to unset it few lines later
if ($rows["id"] == $startStop) {
$poz = $brojac;
}
$counter++;
}
// Set starting time to starting stop
$time[$startStop] = $startTime;
// Set it activeNode
$activeNode = $startStop;
// Unset it in allStops array (so it doens't have to be checked later)
unset($allStops[$poz]);
$allStops = array_values($allStops);
// I can put "while (true)" because all nodes are connected in "one piece", there is NO UNCONNECTED nodes
while (true) {
$result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);
potvrdi_unos($result);
while($rows = mysql_fetch_array($result)) {
// Draw paths from active node to all other (connected) stops
$nextStopArray = $rows["next_stop"];
// nextStopArray is something like "0,34,123,3123,213" - those are all stops from current, active node/stop
$nextStopArray = explode(",", $nextStopArray);
// sometimes it's just "4" to convert it into array
if (!is_array($nextStopArray)) {
$nextStopArray[0] = $nextStopArray;
}
for ($p = 0; $p < sizeof($nextStopArray); $p++) {
$nextStop = $nextStopArray[$p];
$walkToTheStop = false;
// Few checks
if ($p == 0) {
if ($nextStop != 0) {
$pathDuration = 2;
if ($lineToThisStop[$activeNode] != $rows["route_id"]) {
$pathDuration = $pathDuration * 2;
}
}
} else {
$walkToTheStop = true;
$pathDuration = 1;
}
// If that's shortest path from ActiveNode to nextStop insert it into it's time array (time to get to that stop)
if (($pathDuration + $time[$activeNode]) < $time[$nextStop]) {
$time[$nextStop] = $pathDuration + $time[$activeNode];
array_push_key($previousNode, $nextStop, $activeNode);
// Some aditional records (5000 means "you must walk to that stop")
if ($walkToTheStop) {
$lineToThisStop[$nextStop] = 5000;
} else {
$lineToThisStop[$nextStop] = $rows["route_id"];
}
}
}
}
// Traži slijedeću stanicu (vrh) s najmanjom vrijednosti
$lowestValue = 10000 + 1;
for ($j = 0; $j < sizeof($allStops); $j++) {
if ($time[$allStops[$j]] < $lowestValue) {
$lowestValue = $time[$allStops[$j]];
$activeNode = $allStops[$j];
// Record it's position so I can unset it later
$activeNodePosition = $j;
}
}
// Unset the active node from the array - so loop before will be shorter every time for one node/stop
unset($allStops[$activeNodePosition]);
$allStops = array_values($allStops);
// If you get to the end stop, feel free to break out of the loop
if ($activeNode == $endStop) {
break;
}
}
// How long did it take?
$timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $finish = $timeM;
$total_time = round(($finish - $start), 4);
echo 'Total time '.$total_time.' seconds.'."<br />";
?>
<?php require_once("../includes/close_connection.php"); ?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
微优化,但不要这样做:
在循环之前计算 sizeof($nextStopArray),否则您将在每次迭代时进行计数(并且该值不是'没有被改变)
有几个地方应该改变。
如果您迭代数千次,++$p 比 $p++ 更快,
但分析该函数...找出哪些部分执行时间最长,并寻求优化它们。
编辑
摆脱 array_push_key 作为函数,只需内联执行它...这会花费您不必要的函数调用,否则
在 while(true) 循环之外构建数据库中所有节点的数组。 . 在单个 SQL 查询中检索所有数据并构建查找数组。
替换
为
也可能会更快(只需检查逻辑是否循环了正确的次数)。
Micro-optimisations, but don't do:
calculate the sizeof($nextStopArray) before the loop, otherwise you're doing the count every iteration (and this value isn't being changed)
There's a couple of places where this should be changed.
And if you're iterating several thousand times, ++$p is faster than $p++
But profile the function... find out which parts are taking the longest to execute, and look to optimise those.
EDIT
Get rid of array_push_key as a function, simply execute it inline... it's costing you an unnecessary function call otherwise
Build an array of all nodes from your database outside of the while(true) loop... retrieve all the data in a single SQL query and build a lookup array.
Replacing
with
may also prove faster still (just check that the logic does loop through the correct number of times).
乍一看(顺便说一句,您确实应该进行一些分析),罪魁祸首是您正在对每个图节点执行查询以查找其邻居:
如果您有 1,700 个节点,则问题将按以下顺序进行:一千个查询。不要频繁访问数据库,而是将这些数据库结果缓存在 memcached 之类的内容中,并且仅回退到缓存上的数据库错过了。
At a glance (you should really do some profiling, by the way), the culprit is the fact that you are executing a query for each graph node to find its neighbors:
If you have 1,700 nodes this is going to issue on the order of a thousand queries. Rather than hitting the database so often, cache these database results in something like memcached, and only fall back to the database on cache misses.
哪个资源? (CPU?内存?网络带宽?数据库服务器上的I/O 负载?)
如果我没看错的话,您将在每次寻路尝试中对每个节点进行数据库调用。每个调用都会阻塞一段时间,等待数据库的响应。即使您有一个快速的数据库,这也必然需要几毫秒(除非数据库与您的代码在同一服务器上运行)。因此我大胆猜测您的大部分执行时间都花在等待数据库的回复上。
此外,如果您的数据库缺乏适当的索引,则每个查询都可以执行全表扫描...
解决方案很简单:在应用程序启动时将 db_stop_times 加载到内存中,并在解析邻居节点时使用该内存中表示。
编辑:是的,stop_id 上的索引将是此查询的正确索引。至于实际的缓存,我不知道 PHP,但是对于 Java(或 C#、C++,甚至 C)之类的东西,我会使用
比 memcached 快一点的形式表示,但假设你可以将整个表格轻松地放入主内存中。如果你做不到这一点,我会使用像 memcached 这样的缓存系统。
Which resource? (CPU? Memory? Network bandwidth? I/O load on the database server?)
If I am reading this right you are doing a database call for every node in every pathfinding attempt. Each of these calls will block for a little while waiting for the response from the database. Even if you have a fast database, that's bound to take a couple milliseconds (unless the database is running on the same server as your code). So I'd venture the guess that most of your execution time is spent waiting for replies from the database.
Moreover, should your database lack proper indexes, each query could do a full table scan ...
The solution is straightforward: Load db_stop_times into memory at application startup, and use that in-memory representation when resolving neighbour nodes.
Edit: Yes, an index on stop_id would be a proper index for this query. As for practical caching, I don't know PHP, but with something like Java (or C#, or C++, or even C) I'd use a representation of the form
that would be a bit faster than memcached, but assumes you can comfortably fit the entire table in main memory. If you can't do that, I'd use a caching system like memcached.