如何计算小型网络的 PageRank?

发布于 2024-08-05 20:38:00 字数 682 浏览 9 评论 0原文

我的 Mysql 数据库中有两个表

table1 包含我的网络中的所有网页

         | table1: (pages)|
         |----------------|
         | id   | url     |
         |----------------|

table2 有两个字段,分别是链接的源页面和链接的目标页面

          |---------------------------|
          |table2(links)              |
          |---------------------------|
          |from_page_id   | to_page_id|
          |----------------------------

如何计算我的网络的页面排名

我找到了这篇文章这里它解释了PageRank算法,但是很难用它来写出他们的公式PHP + 我不擅长数学

谢谢

更新:

我的网络中有近 5000 个页面

I have two tabled in my Mysql database

table1 has the all webpages in my network

         | table1: (pages)|
         |----------------|
         | id   | url     |
         |----------------|

table2 has two fields, which are the source page of the link and the destination page of the link

          |---------------------------|
          |table2(links)              |
          |---------------------------|
          |from_page_id   | to_page_id|
          |----------------------------

How to calculate the page rank for my network

I have found this article here it explains the PageRank algorithm but it is very difficult to write their formula in PHP + I am not good at math

Thanks

update:

I have almost 5000 pages in my network

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

回眸一笑 2024-08-12 20:38:00

嗨,

我想我已经弄清楚了如何做到这一点,但我不确定

我会直到你和你判断我计算页面排名的方式是否正确,

首先我在“页面”表中添加了一个新列,称为它是“outgoinglinks”,它包含从该页面发出的链接数量

,我添加了另外两列“pagerank”和“pagerank2”

以及另一列称为“i”的列,它计算迭代次数

现在让我们开始编程

     $step="pg";
     for($i=0;$i<50;$i++){
         if($step=="pg2"){
             $step="pg";
         }else{
             $step="pg2";
         }
         $totalpages=5000;
         $sql1 = "select id from pages";
         $result1 = $DB->query($sql1);
         while($row1 = $DB->fetch_array($result1)){
             $page_id = $row1["id"];
             $sql = "select * from links where to_page_id = '$page_id'";
             $result = $DB->query($sql);
             $weights_of_links=0;//sum of pageranks/number of outgoing links
             while($row = $DB->fetch_array($result)){
                   $from_page_id = $row["from_page_id"];
                   $row2 = get_record_select("pages","id = '$from_page_id'");
                   $outgoinglinks = $row2["outgoinglinks"];
                   if($step=="pg2"){
                           $from_page_id_pagerank = $row2["pagerank2"];
                   }else{
                           $from_page_id_pagerank = $row2["pagerank"];
                   }

                   $weights_of_links +=($from_page_id_pagerank/$outgoinglinks );
             }

            //final step I tried to write the formula from wikipedia and the paper I have referred to
            $pagerank = .15/$totalpages + .85*($weights_of_links);
            //update the pagerank
           $ii = $i+1;
           if($step=="pg2"){
                 update_record("pages","id='$url_id'","pagerank='$pagerank',i='$ii'");
           }else{
                 update_record("pages","id='$url_id'","pagerank2='$pagerank',i='$ii'");
           }
         }
      }

<强>注意:

开始之前,请确保将其中一个页面(任何页面)的 pagerank 设置为 1,并将其他页面保留为 0

为什么有两个 pagerank 列?

我这样做是因为我认为我们应该将每次迭代分开以进行准确的计算,以便我们的脚本将在这两列之间交替,每次迭代都会对其中一个页面排名列进行处理,并将新结果保存到另一个页面排名列,

之前的代码将循环多次获得准确的结果,例如每次 50 次,我们都会更接近页面的真实页面排名

我的问题是,我的网络中所有页面排名的总和是否应该等于 1!
如果是的话,谷歌如何给每个页面排名10?!

有什么想法吗?

谢谢

HI again

I think I have figured out how to do it but I am not sure

I will till you and you judge if my way in calculation the pagerank is correct or not

first I have added a new column to the "pages" table a called it "outgoinglinks" it has the number of out going links from that page

and I have added another two columns "pagerank" and "pagerank2"

and another column called "i" which count the the number of iterations

now lets move to the programming

     $step="pg";
     for($i=0;$i<50;$i++){
         if($step=="pg2"){
             $step="pg";
         }else{
             $step="pg2";
         }
         $totalpages=5000;
         $sql1 = "select id from pages";
         $result1 = $DB->query($sql1);
         while($row1 = $DB->fetch_array($result1)){
             $page_id = $row1["id"];
             $sql = "select * from links where to_page_id = '$page_id'";
             $result = $DB->query($sql);
             $weights_of_links=0;//sum of pageranks/number of outgoing links
             while($row = $DB->fetch_array($result)){
                   $from_page_id = $row["from_page_id"];
                   $row2 = get_record_select("pages","id = '$from_page_id'");
                   $outgoinglinks = $row2["outgoinglinks"];
                   if($step=="pg2"){
                           $from_page_id_pagerank = $row2["pagerank2"];
                   }else{
                           $from_page_id_pagerank = $row2["pagerank"];
                   }

                   $weights_of_links +=($from_page_id_pagerank/$outgoinglinks );
             }

            //final step I tried to write the formula from wikipedia and the paper I have referred to
            $pagerank = .15/$totalpages + .85*($weights_of_links);
            //update the pagerank
           $ii = $i+1;
           if($step=="pg2"){
                 update_record("pages","id='$url_id'","pagerank='$pagerank',i='$ii'");
           }else{
                 update_record("pages","id='$url_id'","pagerank2='$pagerank',i='$ii'");
           }
         }
      }

note:

before you start make sure to set the pagerank of one of the pages (any page) to 1 and leave other pages with 0

why two pageranks columns?

I did that because I think we should separate every iteration to have an accurate calculation so our script will alternate between those two columns, every iteration will do the processing for one of the page rank columns and save the new results to the other pagerank column

the previous code will loop for many times to get an accurate results like 50 times each time we will get closer to the real pageranks for our pages

my question is, if the sum of all the pageranks in my network should be equal 1!
if yes how is google giving every page a rank out of 10?!

any ideas?

Thanks

极致的悲 2024-08-12 20:38:00

如果这是您自己的网络,为什么还需要 PageRank?为什么不直接计算从唯一页面到特定页面的链接总数并将该数字用作页面评级?

Why do you need exactly PageRank if that's your own network? Why not just to calculate the total number of links from unique pages to a particular page and use this number as a page rating?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文