在一堆 ISO-889-1 网页中查找非法字符的最佳方法？

发布于 2024-08-09 18:48:32 字数 153 浏览 23 评论 0原文

我的网站中有一堆 html 文件，这些文件创建于 2000 年，并一直维护到今天。我们最近开始努力用 HTML 实体替换非法字符。一页一页地寻找版权符号和商标标签似乎是一件很麻烦的事。你们中有人知道有一个应用程序可以获取一堆 html 文件并告诉我在哪里需要用 html 实体替换非法字符吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光无声 2024-08-16 18:48:32

你可以编写一个 PHP 脚本（如果可以；如果不能，我很乐意提供帮助），但我假设你已经转换了一些“特殊字符”，所以这确实使任务变得有点困难（尽管我仍然认为这是可能的）...

回复收藏 0 原文

手心的海 2024-08-16 18:48:32

任何好的文本编辑器都会为您执行文件内容搜索并返回匹配列表。

我使用 EditPlus 执行此操作。有几种编辑器，例如 Notepad++、TextPad 等将轻松帮助您做到这一点。

您不必打开这些文件。您只需指定存储文件的路径以及掩码 (*.html) 和搜索“©”的内容，编辑器将返回一个匹配列表，当您双击时，它会打开文件并带来向上匹配线。

回复收藏 0 原文

近箐 2024-08-16 18:48:32

我还有一个网站，需要定期在字符集之间来回转换大量文件名。虽然文本编辑器可以做到这一点，但在 php 中使用 2 个步骤的便携式解决方案是更好的选择。首先，将文件名添加到数组中，然后进行搜索和替换。函数中的一段额外代码从数组中排除某些文件类型。

Function listdir($start_dir='.') {                                                           
  $nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders 
  $filesArray = array() ; // $filesArray holds new records and $full[$j] holds names         
  if (is_dir($start_dir)) {                                                                  
    $fh = opendir($start_dir);                                                               
    while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path      
      if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..    
      $filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file             
      if (is_dir($filepath)) // if path/to/file is a folder, recurse into it                 
        $filesArray = array_merge($filesArray, listdir($filepath));                          
      else // add $filepath to the end of the array                                          

      $test=1 ; foreach ($nonFilesArray as $nonfile) {                                       
        if ($tmpFile == $nonfile) { $test=0 ; break ; } }                                    
      if ( is_dir($filepath) ) { $test=0 ; }                                                 
      if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {                      
        $filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
        $filesArray[] = $filepath ; }                                                        
    }                                                                                        
    closedir($fh);                                                                           
  } else { $filesArray = false; } # no such folder                                           
  return $filesArray ;                                                                       
}                                                                                            

$filesArray = listdir($targetdir); // call the function for this directory                   
$numNewFiles = count($filesArray) ; // get number of records                                 

for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters    
  $tmplnk = $linkpath .$filesArray[$i] ;                                                     
  $outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname); 
}

I also have a website that needs to regularly convert large numbers of file names back and forth between character sets. While a text editor can do this, a portable solution using 2 steps in php was preferrable. First, add the filenames to an array, then do the search and replace. An extra piece of code in the function excludes certain file types from the array.

Function listdir($start_dir='.') {                                                           
  $nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders 
  $filesArray = array() ; // $filesArray holds new records and $full[$j] holds names         
  if (is_dir($start_dir)) {                                                                  
    $fh = opendir($start_dir);                                                               
    while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path      
      if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..    
      $filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file             
      if (is_dir($filepath)) // if path/to/file is a folder, recurse into it                 
        $filesArray = array_merge($filesArray, listdir($filepath));                          
      else // add $filepath to the end of the array                                          

      $test=1 ; foreach ($nonFilesArray as $nonfile) {                                       
        if ($tmpFile == $nonfile) { $test=0 ; break ; } }                                    
      if ( is_dir($filepath) ) { $test=0 ; }                                                 
      if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {                      
        $filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
        $filesArray[] = $filepath ; }                                                        
    }                                                                                        
    closedir($fh);                                                                           
  } else { $filesArray = false; } # no such folder                                           
  return $filesArray ;                                                                       
}                                                                                            

$filesArray = listdir($targetdir); // call the function for this directory                   
$numNewFiles = count($filesArray) ; // get number of records                                 

for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters    
  $tmplnk = $linkpath .$filesArray[$i] ;                                                     
  $outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname); 
}

回复收藏 0 原文

~没有更多了~