PHP 句子使用已知单词字典将带有大写专有名词的字符串区分大小写?
我需要根据单词词典(txt 文件)搜索一串单词,并将未找到的单词大写。
我试图将字符串拆分为单词数组,并根据 unix /usr/dict/words 字典检查它们。如果找到单词匹配,则获取 lcfirst($word)
如果没有匹配,则获取 ucfirst( $word )
字典将打开并使用 fgetcsv 将其放入数组中(我还尝试使用 fgets 并在行尾爆炸)。
function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
$line_of_text = fgetcsv( $file );
$exceptions = array( $line_of_text );
}
fclose( $file );
$delimiters = array(" ", "-", "O'");
foreach ( $delimiters as $delimiter ) {
$words = explode( $delimiter, $string );
$newwords = array();
foreach ($words as $word) {
if ( in_array( strtoupper( $word ), $exceptions ) ) {
// check exceptions list for any words that should be lower case
$word = lcfirst( $word );
} elseif ( !in_array( $word, $exceptions ) ) {
// everything else capitalized
$word = ucfirst( $word );
}
array_push( $newwords, $word );
}
$string = join( $delimiter, $newwords );
}
$string = ucfirst( $string );
return $string;
}
我已经验证该文件已打开。
所需的输出:句子大小写标题字符串,专有名词大写。
当前输出: 每个单词大写的标题字符串
编辑:
使用下面杰伊的答案,我想出了一个可行的解决方案。我的第一个问题是我的单词字典包含大写和非大写单词,因此我找到了一个专有名称字典来检查是否使用正则表达式回调。它并不完美,但大多数时候都是正确的。
function title_case( $string ) {
$fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );
$exceptions = array();
if ( $fp ) {
while( !feof($fp) ) {
$buffer = fgets( $fp );
array_push( $exceptions, trim($buffer) );
}
}
fclose( $fp );
$content = strtolower( $string );
$pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
$content = preg_replace_callback ( $pattern, 'regex_callback', $content );
$new_content = $content;
return ucfirst( $new_content );
}
function regex_callback ( $data ) {
if ( strlen( $data[0] ) > 3 )
return ucfirst( strtolower( $data[0] ));
else return ( $data[0] );
}
I need to search a string of words against a dictionary of words(txt file) and capitalize any word that is not found.
I'm trying to split the string into an array of words and check them against the unix /usr/dict/words dictionary. If a match is found for the word it gets lcfirst($word)
if no match then ucfirst( $word )
The dictionary is opened and put into an array using fgetcsv (I also tried using fgets and exploding on end of line).
function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
$line_of_text = fgetcsv( $file );
$exceptions = array( $line_of_text );
}
fclose( $file );
$delimiters = array(" ", "-", "O'");
foreach ( $delimiters as $delimiter ) {
$words = explode( $delimiter, $string );
$newwords = array();
foreach ($words as $word) {
if ( in_array( strtoupper( $word ), $exceptions ) ) {
// check exceptions list for any words that should be lower case
$word = lcfirst( $word );
} elseif ( !in_array( $word, $exceptions ) ) {
// everything else capitalized
$word = ucfirst( $word );
}
array_push( $newwords, $word );
}
$string = join( $delimiter, $newwords );
}
$string = ucfirst( $string );
return $string;
}
I have verified that the file gets opened.
The desired output: Sentence case title string with proper nouns capitalized.
The current output: Title string with every word capitalized
Edit:
Using Jay's answer below I came up with a workable solution. My first problem was that my words dictionary contained both capitalized and non capitalized words so I found a proper names dictionary to to check against using a regex callback. It's not perfect but gets it right most of the time.
function title_case( $string ) {
$fp = @fopen( THEME_DIR. "/_/inc/propernames", "r" );
$exceptions = array();
if ( $fp ) {
while( !feof($fp) ) {
$buffer = fgets( $fp );
array_push( $exceptions, trim($buffer) );
}
}
fclose( $fp );
$content = strtolower( $string );
$pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
$content = preg_replace_callback ( $pattern, 'regex_callback', $content );
$new_content = $content;
return ucfirst( $new_content );
}
function regex_callback ( $data ) {
if ( strlen( $data[0] ) > 3 )
return ucfirst( strtolower( $data[0] ));
else return ( $data[0] );
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用正则表达式执行此操作的最简单方法是将
$content = ucwords($original_content);
|
内爆所有单词,并用边界标记和定界符包围它,后跟不区分大小写的标志,所以你最终会得到~\bword1|word2|word3\b~i
(显然对于你的大列表)一个工作演示的例子是这样的
你可以还可以选择使用 strtolower 来开始内容以保持一致性。上面的代码输出
hello my name is Jay Gilford
The simplest way to do this with regex is to do the following
$content = ucwords($original_content);
|
, and surrounding it with boundary markers and delimiters followed by the case insensitive flag, so you would end up with~\bword1|word2|word3\b~i
(obviously with your large list)An example of a working demo is this
You could also optionally use strtolower to begin with on the content for consistency. The above code outputs
hello my name is Jay Gilford