PHP 句子使用已知单词字典将带有大写专有名词的字符串区分大小写？

发布于 2024-12-12 06:45:12 字数 2210 浏览 0 评论 0原文

我需要根据单词词典（txt 文件）搜索一串单词，并将未找到的单词大写。

我试图将字符串拆分为单词数组，并根据 unix /usr/dict/words 字典检查它们。如果找到单词匹配，则获取 lcfirst($word) 如果没有匹配，则获取 ucfirst( $word )

字典将打开并使用 fgetcsv 将其放入数组中（我还尝试使用 fgets 并在行尾爆炸）。

function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
    $line_of_text = fgetcsv( $file );
     $exceptions = array( $line_of_text );
}


fclose( $file );
    $delimiters = array(" ", "-", "O'");
         foreach ( $delimiters as $delimiter ) {
            $words = explode( $delimiter, $string );
            $newwords = array();
                 foreach ($words as $word) {
                if ( in_array( strtoupper( $word ), $exceptions ) ) {
           // check exceptions list for any words that should be lower case
            $word = lcfirst( $word );
            } elseif ( !in_array( $word, $exceptions ) ) {
       // everything else capitalized
            $word = ucfirst( $word );
         }
       array_push( $newwords, $word );
       }
    $string = join( $delimiter, $newwords );
   }
        $string = ucfirst( $string );
   return $string;
}

我已经验证该文件已打开。

所需的输出：句子大小写标题字符串，专有名词大写。
当前输出： 每个单词大写的标题字符串

编辑：

使用下面杰伊的答案，我想出了一个可行的解决方案。我的第一个问题是我的单词字典包含大写和非大写单词，因此我找到了一个专有名称字典来检查是否使用正则表达式回调。它并不完美，但大多数时候都是正确的。

function title_case( $string ) {
    $fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );  
        $exceptions = array();
        if ( $fp ) {

            while( !feof($fp) ) {
                    $buffer = fgets( $fp );
                array_push( $exceptions, trim($buffer) );
            }

        }

    fclose( $fp );

    $content = strtolower( $string );
    $pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
    $content =  preg_replace_callback (  $pattern, 'regex_callback', $content  );
    $new_content =  $content;

    return ucfirst( $new_content );
}

    function regex_callback ( $data ) {
        if ( strlen( $data[0] )  > 3 )
        return ucfirst( strtolower( $data[0] ));
        else return ( $data[0] );

    }

原文

I need to search a string of words against a dictionary of words(txt file) and capitalize any word that is not found.

I'm trying to split the string into an array of words and check them against the unix /usr/dict/words dictionary. If a match is found for the word it gets lcfirst($word) if no match then ucfirst( $word )

The dictionary is opened and put into an array using fgetcsv (I also tried using fgets and exploding on end of line).

function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
    $line_of_text = fgetcsv( $file );
     $exceptions = array( $line_of_text );
}


fclose( $file );
    $delimiters = array(" ", "-", "O'");
         foreach ( $delimiters as $delimiter ) {
            $words = explode( $delimiter, $string );
            $newwords = array();
                 foreach ($words as $word) {
                if ( in_array( strtoupper( $word ), $exceptions ) ) {
           // check exceptions list for any words that should be lower case
            $word = lcfirst( $word );
            } elseif ( !in_array( $word, $exceptions ) ) {
       // everything else capitalized
            $word = ucfirst( $word );
         }
       array_push( $newwords, $word );
       }
    $string = join( $delimiter, $newwords );
   }
        $string = ucfirst( $string );
   return $string;
}

I have verified that the file gets opened.

The desired output: Sentence case title string with proper nouns capitalized.
The current output: Title string with every word capitalized

Edit:

Using Jay's answer below I came up with a workable solution. My first problem was that my words dictionary contained both capitalized and non capitalized words so I found a proper names dictionary to to check against using a regex callback. It's not perfect but gets it right most of the time.

function title_case( $string ) {
    $fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );  
        $exceptions = array();
        if ( $fp ) {

            while( !feof($fp) ) {
                    $buffer = fgets( $fp );
                array_push( $exceptions, trim($buffer) );
            }

        }

    fclose( $fp );

    $content = strtolower( $string );
    $pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
    $content =  preg_replace_callback (  $pattern, 'regex_callback', $content  );
    $new_content =  $content;

    return ucfirst( $new_content );
}

    function regex_callback ( $data ) {
        if ( strlen( $data[0] )  > 3 )
        return ucfirst( strtolower( $data[0] ));
        else return ( $data[0] );

    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

罗罗贝儿 2024-12-19 06:45:12

使用正则表达式执行此操作的最简单方法是将

文本转换为全部大写首字母 $content = ucwords($original_content);
使用字典中的单词数组，通过以下方式创建正则表达式用管道字符 | 内爆所有单词，并用边界标记和定界符包围它，后跟不区分大小写的标志，所以你最终会得到~\bword1|word2|word3\b~i （显然对于你的大列表）
创建一个函数来使用 strtolower 来降低匹配值，并与 preg_replace_callback 一起使用

一个工作演示的例子是这样的

function regex_callback($data) {
    return strtolower($data[0]);
}

$original_content = 'hello my name is jay gilford';
$words = array('hello', 'my', 'name', 'is');

$content = ucwords($original_content);
$pattern = '~\b' . implode('|', $words) . '\b~i';

$content = preg_replace_callback($pattern, 'regex_callback', $content);

echo $content;

你可以还可以选择使用 strtolower 来开始内容以保持一致性。上面的代码输出 hello my name is Jay Gilford

The simplest way to do this with regex is to do the following

convert your text to all uppercase first letters $content = ucwords($original_content);
Using your array of words in the dictionary, create a regex by imploding all your words with a pipe character |, and surrounding it with boundary markers and delimiters followed by the case insensitive flag, so you would end up with ~\bword1|word2|word3\b~i (obviously with your large list)
create a function to lower the matched value using strtolower to be used with preg_replace_callback

An example of a working demo is this

function regex_callback($data) {
    return strtolower($data[0]);
}

$original_content = 'hello my name is jay gilford';
$words = array('hello', 'my', 'name', 'is');

$content = ucwords($original_content);
$pattern = '~\b' . implode('|', $words) . '\b~i';

$content = preg_replace_callback($pattern, 'regex_callback', $content);

echo $content;

You could also optionally use strtolower to begin with on the content for consistency. The above code outputs hello my name is Jay Gilford

回复收藏 0 原文

~没有更多了~