在一堆 ISO-889-1 网页中查找非法字符的最佳方法?

发布于 2024-08-09 18:48:32 字数 153 浏览 6 评论 0原文

我的网站中有一堆 html 文件,这些文件创建于 2000 年,并一直维护到今天。我们最近开始努力用 HTML 实体替换非法字符。一页一页地寻找版权符号和商标标签似乎是一件很麻烦的事。你们中有人知道有一个应用程序可以获取一堆 html 文件并告诉我在哪里需要用 html 实体替换非法字符吗?

I have a bunch of html files in a site that were created in the year 2000 and have been maintained to this day. We've recently began an effort to replace illegal characters with their html entities. Going page to page looking for copyright symbols and trademark tags seems like quite a chore. Do any of you know of an app that will take a bunch of html files and tell me where I need to replace illegal characters with html entities?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时光无声 2024-08-16 18:48:32

你可以编写一个 PHP 脚本(如果可以;如果不能,我很乐意提供帮助),但我假设你已经转换了一些“特殊字符”,所以这确实使任务变得有点困难(尽管我仍然认为这是可能的)...

You could write a PHP script (if you can; if not, I'd be happy to help), but I assume you already converted some of the "special characters", so that does make the task a little harder (although I still think it's possible)...

手心的海 2024-08-16 18:48:32

任何好的文本编辑器都会为您执行文件内容搜索并返回匹配列表。

我使用 EditPlus 执行此操作。有几种编辑器,例如 Notepad++TextPad 等将轻松帮助您做到这一点。

您不必打开这些文件。您只需指定存储文件的路径以及掩码 (*.html) 和搜索“©”的内容,编辑器将返回一个匹配列表,当您双击时,它会打开文件并带来向上匹配线。

Any good text editor will do a file contents search for you and return a list of matches.

I do this with EditPlus. There are several editors like Notepad++, TextPad, etc that will easily help you do this.

You do not have to open the files. You just specify a path where the files are stored and the Mask (*.html) and the contents to search for "©" and the editor will come back with a list of matches and when you double click, it opens the file and brings up the matching line.

近箐 2024-08-16 18:48:32

我还有一个网站,需要定期在字符集之间来回转换大量文件名。虽然文本编辑器可以做到这一点,但在 php 中使用 2 个步骤的便携式解决方案是更好的选择。首先,将文件名添加到数组中,然后进行搜索和替换。函数中的一段额外代码从数组中排除某些文件类型。

Function listdir($start_dir='.') {                                                           
  $nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders 
  $filesArray = array() ; // $filesArray holds new records and $full[$j] holds names         
  if (is_dir($start_dir)) {                                                                  
    $fh = opendir($start_dir);                                                               
    while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path      
      if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..    
      $filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file             
      if (is_dir($filepath)) // if path/to/file is a folder, recurse into it                 
        $filesArray = array_merge($filesArray, listdir($filepath));                          
      else // add $filepath to the end of the array                                          

      $test=1 ; foreach ($nonFilesArray as $nonfile) {                                       
        if ($tmpFile == $nonfile) { $test=0 ; break ; } }                                    
      if ( is_dir($filepath) ) { $test=0 ; }                                                 
      if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {                      
        $filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
        $filesArray[] = $filepath ; }                                                        
    }                                                                                        
    closedir($fh);                                                                           
  } else { $filesArray = false; } # no such folder                                           
  return $filesArray ;                                                                       
}                                                                                            

$filesArray = listdir($targetdir); // call the function for this directory                   
$numNewFiles = count($filesArray) ; // get number of records                                 

for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters    
  $tmplnk = $linkpath .$filesArray[$i] ;                                                     
  $outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname); 
}                                                                                            

I also have a website that needs to regularly convert large numbers of file names back and forth between character sets. While a text editor can do this, a portable solution using 2 steps in php was preferrable. First, add the filenames to an array, then do the search and replace. An extra piece of code in the function excludes certain file types from the array.

Function listdir($start_dir='.') {                                                           
  $nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders 
  $filesArray = array() ; // $filesArray holds new records and $full[$j] holds names         
  if (is_dir($start_dir)) {                                                                  
    $fh = opendir($start_dir);                                                               
    while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path      
      if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..    
      $filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file             
      if (is_dir($filepath)) // if path/to/file is a folder, recurse into it                 
        $filesArray = array_merge($filesArray, listdir($filepath));                          
      else // add $filepath to the end of the array                                          

      $test=1 ; foreach ($nonFilesArray as $nonfile) {                                       
        if ($tmpFile == $nonfile) { $test=0 ; break ; } }                                    
      if ( is_dir($filepath) ) { $test=0 ; }                                                 
      if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {                      
        $filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
        $filesArray[] = $filepath ; }                                                        
    }                                                                                        
    closedir($fh);                                                                           
  } else { $filesArray = false; } # no such folder                                           
  return $filesArray ;                                                                       
}                                                                                            

$filesArray = listdir($targetdir); // call the function for this directory                   
$numNewFiles = count($filesArray) ; // get number of records                                 

for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters    
  $tmplnk = $linkpath .$filesArray[$i] ;                                                     
  $outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname); 
}                                                                                            
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文