PHP 句子使用已知单词字典将带有大写专有名词的字符串区分大小写?

发布于 2024-12-12 06:45:12 字数 2210 浏览 0 评论 0原文

我需要根据单词词典(txt 文件)搜索一串单词,并将未找到的单词大写。

我试图将字符串拆分为单词数组,并根据 unix /usr/dict/words 字典检查它们。如果找到单词匹配,则获取 lcfirst($word) 如果没有匹配,则获取 ucfirst( $word )

字典将打开并使用 fgetcsv 将其放入数组中(我还尝试使用 fgets 并在行尾爆炸)。

function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
    $line_of_text = fgetcsv( $file );
     $exceptions = array( $line_of_text );
}


fclose( $file );
    $delimiters = array(" ", "-", "O'");
         foreach ( $delimiters as $delimiter ) {
            $words = explode( $delimiter, $string );
            $newwords = array();
                 foreach ($words as $word) {
                if ( in_array( strtoupper( $word ), $exceptions ) ) {
           // check exceptions list for any words that should be lower case
            $word = lcfirst( $word );
            } elseif ( !in_array( $word, $exceptions ) ) {
       // everything else capitalized
            $word = ucfirst( $word );
         }
       array_push( $newwords, $word );
       }
    $string = join( $delimiter, $newwords );
   }
        $string = ucfirst( $string );
   return $string;
}

我已经验证该文件已打开。

所需的输出:句子大小写标题字符串,专有名词大写。
当前输出: 每个单词大写的标题字符串

编辑:

使用下面杰伊的答案,我想出了一个可行的解决方案。我的第一个问题是我的单词字典包含大写和非大写单词,因此我找到了一个专有名称字典来检查是否使用正则表达式回调。它并不完美,但大多数时候都是正确的。

function title_case( $string ) {
    $fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );  
        $exceptions = array();
        if ( $fp ) {

            while( !feof($fp) ) {
                    $buffer = fgets( $fp );
                array_push( $exceptions, trim($buffer) );
            }

        }

    fclose( $fp );

    $content = strtolower( $string );
    $pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
    $content =  preg_replace_callback (  $pattern, 'regex_callback', $content  );
    $new_content =  $content;

    return ucfirst( $new_content );
}

    function regex_callback ( $data ) {
        if ( strlen( $data[0] )  > 3 )
        return ucfirst( strtolower( $data[0] ));
        else return ( $data[0] );

    }

I need to search a string of words against a dictionary of words(txt file) and capitalize any word that is not found.

I'm trying to split the string into an array of words and check them against the unix /usr/dict/words dictionary. If a match is found for the word it gets lcfirst($word) if no match then ucfirst( $word )

The dictionary is opened and put into an array using fgetcsv (I also tried using fgets and exploding on end of line).

function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
    $line_of_text = fgetcsv( $file );
     $exceptions = array( $line_of_text );
}


fclose( $file );
    $delimiters = array(" ", "-", "O'");
         foreach ( $delimiters as $delimiter ) {
            $words = explode( $delimiter, $string );
            $newwords = array();
                 foreach ($words as $word) {
                if ( in_array( strtoupper( $word ), $exceptions ) ) {
           // check exceptions list for any words that should be lower case
            $word = lcfirst( $word );
            } elseif ( !in_array( $word, $exceptions ) ) {
       // everything else capitalized
            $word = ucfirst( $word );
         }
       array_push( $newwords, $word );
       }
    $string = join( $delimiter, $newwords );
   }
        $string = ucfirst( $string );
   return $string;
}

I have verified that the file gets opened.

The desired output: Sentence case title string with proper nouns capitalized.
The current output: Title string with every word capitalized

Edit:

Using Jay's answer below I came up with a workable solution. My first problem was that my words dictionary contained both capitalized and non capitalized words so I found a proper names dictionary to to check against using a regex callback. It's not perfect but gets it right most of the time.

function title_case( $string ) {
    $fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );  
        $exceptions = array();
        if ( $fp ) {

            while( !feof($fp) ) {
                    $buffer = fgets( $fp );
                array_push( $exceptions, trim($buffer) );
            }

        }

    fclose( $fp );

    $content = strtolower( $string );
    $pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
    $content =  preg_replace_callback (  $pattern, 'regex_callback', $content  );
    $new_content =  $content;

    return ucfirst( $new_content );
}

    function regex_callback ( $data ) {
        if ( strlen( $data[0] )  > 3 )
        return ucfirst( strtolower( $data[0] ));
        else return ( $data[0] );

    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

罗罗贝儿 2024-12-19 06:45:12

使用正则表达式执行此操作的最简单方法是将

  1. 文本转换为全部大写首字母 $content = ucwords($original_content);
  2. 使用字典中的单词数组,通过以下方式创建正则表达式用管道字符 | 内爆所有单词,并用边界标记和定界符包围它,后跟不区分大小写的标志,所以你最终会得到~\bword1|word2|word3\b~i (显然对于你的大列表)
  3. 创建一个函数来使用 strtolower 来降低匹配值,并与 preg_replace_callback 一起使用

一个工作演示的例子是这样的

function regex_callback($data) {
    return strtolower($data[0]);
}

$original_content = 'hello my name is jay gilford';
$words = array('hello', 'my', 'name', 'is');

$content = ucwords($original_content);
$pattern = '~\b' . implode('|', $words) . '\b~i';

$content = preg_replace_callback($pattern, 'regex_callback', $content);

echo $content;

你可以还可以选择使用 strtolower 来开始内容以保持一致性。上面的代码输出 hello my name is Jay Gilford

The simplest way to do this with regex is to do the following

  1. convert your text to all uppercase first letters $content = ucwords($original_content);
  2. Using your array of words in the dictionary, create a regex by imploding all your words with a pipe character |, and surrounding it with boundary markers and delimiters followed by the case insensitive flag, so you would end up with ~\bword1|word2|word3\b~i (obviously with your large list)
  3. create a function to lower the matched value using strtolower to be used with preg_replace_callback

An example of a working demo is this

function regex_callback($data) {
    return strtolower($data[0]);
}

$original_content = 'hello my name is jay gilford';
$words = array('hello', 'my', 'name', 'is');

$content = ucwords($original_content);
$pattern = '~\b' . implode('|', $words) . '\b~i';

$content = preg_replace_callback($pattern, 'regex_callback', $content);

echo $content;

You could also optionally use strtolower to begin with on the content for consistency. The above code outputs hello my name is Jay Gilford

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文