str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_count — Cuenta el número de palabras utilizadas en un string

Descripción

str_word_count(string $string, int $format = 0, ?string $characters = null): array|int

str_word_count() cuenta el número de palabras en el string string. Si el argumento opcional format no está especificado, entonces el valor devuelto será un integer, representando el número de palabras encontradas. Si format está especificado, el valor devuelto será un array, que depende del formato format. Los valores posibles para format se listan a continuación.

En esta función, la noción de palabra depende de la configuración de la configuración local. Es un string que contiene todos los caracteres alfabéticos, y que puede contener, pero no comenzar por "'" y "-". Cabe señalar que las configuraciones locales multioctetos no están soportadas.

Parámetros

string

El string

format

Especifica el valor de retorno de esta función. Los valores actualmente soportados son:

0: devuelve el número de palabras encontradas
1: devuelve un array que contiene todas las palabras encontradas dentro de string
2: devuelve un array asociativo, donde la clave indica la posición numérica de la palabra dentro de string y el valor es la palabra actual

characters

Una lista de caracteres adicionales que serán considerados como una palabra

Valores devueltos

Devuelve un array o un integer, dependiendo del format elegido.

Historial de cambios

Versión	Descripción
8.0.0	`characters` ahora es nullable.

Ejemplos

Ejemplo #1 Ejemplo con str_word_count()

<?php

$str = "Salut l'ami, vous
        avez          une b3lle mine !";

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo str_word_count($str);

?>

El ejemplo anterior mostrará :

Array
(
    [0] => Salut
    [1] => l'ami
    [2] => vous
    [3] => avez
    [4] => une
    [5] => b
    [6] => lle
    [7] => mine
)

Array
(
    [0] => Salut
    [6] => l'ami
    [13] => vous
    [27] => avez
    [41] => une
    [45] => b
    [47] => lle
    [51] => mine
)

Array
(
    [0] => Salut
    [1] => l'ami
    [2] => vous
    [3] => avez
    [4] => une
    [5] => b3lle
    [6] => mine
)

8

Ver también

explode() - Divide una string en segmentos
preg_split() - Divide una cadena mediante expresión regular
count_chars() - Devuelve estadísticas sobre los caracteres utilizados en un string
substr_count() - Cuenta el número de ocurrencias de segmentos en un string

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 30 notes

down

cito at wikatu dot com ¶

13 years ago

<?php

/***
 * This simple utf-8 word count function (it only counts) 
 * is a bit faster then the one with preg_match_all
 * about 10x slower then the built-in str_word_count
 * 
 * If you need the hyphen or other code points as word-characters
 * just put them into the [brackets] like [^\p{L}\p{N}\'\-]
 * If the pattern contains utf-8, utf8_encode() the pattern,
 * as it is expected to be valid utf-8 (using the u modifier).
 **/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
  return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>

down

splogamurugan at gmail dot com ¶

16 years ago

We can also specify a range of values for charlist.



<?php

$str = "Hello fri3nd, you're

       looking          good today! 

       look1234ing";

print_r(str_word_count($str, 1, '0..3'));

?>



will give the result as 



Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )

down

Adeel Khan ¶

17 years ago

<?php

/**
 * Returns the number of words in a string.
 * As far as I have tested, it is very accurate.
 * The string can have HTML in it,
 * but you should do something like this first:
 *
 *    $search = array(
 *      '@<script[^>]*?>.*?</script>@si',
 *      '@<style[^>]*?>.*?</style>@siU',
 *      '@<![\s\S]*?--[ \t\n\r]*>@'
 *    );
 *    $html = preg_replace($search, '', $html);
 *
 */

function word_count($html) {

  # strip all html tags
  $wc = strip_tags($html);

  # remove 'words' that don't consist of alphanumerical characters or punctuation
  $pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
  $wc = trim(preg_replace($pattern, " ", $wc));

  # remove one-letter 'words' that consist only of punctuation
  $wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

  # remove superfluous whitespace
  $wc = preg_replace("/\s\s+/", " ", $wc);

  # split string into an array of words
  $wc = explode(" ", $wc);

  # remove empty elements
  $wc = array_filter($wc);

  # return the number of words
  return count($wc);

}

?>

down

manrash at gmail dot com ¶

16 years ago

For spanish speakers a valid character map may be:



<?php

$characterMap = 'áéíóúüñ';



$count = str_word_count($text, 0, $characterMap);

?>

down

uri at speedy dot net ¶

12 years ago

Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php 

function count_words($string) {
    // Return the number of words in a string.
    $string= str_replace("&#039;", "'", $string);
    $t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
    $string= str_replace($t, " ", $string);
    $string= trim(preg_replace("/\s+/", " ", $string));
    $num= 0;
    if (my_strlen($string)>0) {
        $word_array= explode(" ", $string);
        $num= count($word_array);
    }
    return $num;
}

function my_strlen($s) {
    // Return mb_strlen with encoding UTF-8.
    return mb_strlen($s, "UTF-8");
}

?>

down

brettNOSPAM at olwm dot NO_SPAM dot com ¶

22 years ago

This example may not be pretty, but It proves accurate:



<?php

//count words

$words_to_count = strip_tags($body);

$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";

$words_to_count = preg_replace ($pattern, " ", $words_to_count);

$words_to_count = trim($words_to_count);

$total_words = count(explode(" ",$words_to_count));

?>



Hope I didn't miss any punctuation. ;-)

down

php dot net at salagir dot com ¶

7 years ago

This function doesn't handle  accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
      return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
    $a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>

down

dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64) ¶

14 years ago

to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
    $text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
    $text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
    $text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
    $len = strlen($text);
    
    if (0 === $len) {
        return 0;
    }
    
    $words = 1;
    
    while ($len--) {
        if (' ' === $text[$len]) {
            ++$words;
        }
    }
    
    return $words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string

down

brettz9 - see yahoo ¶

15 years ago

Words also cannot end in a hyphen unless allowed by the charlist...

down

charliefrancis at gmail dot com ¶

16 years ago

Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
    if( strlen( $text ) > $limit ) {
        $words = str_word_count( $text, 2, $chars );
        $words = array_reverse( $words, TRUE );
        foreach( $words as $length => $word ) {
            if( $length + strlen( $word ) >= $limit ) {
                array_shift( $words );
            } else {
                break;
            }
        }
        $words = array_reverse( $words );
        $text = implode( " ", $words ) . '&hellip;';
    }
    return $text;
}

$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words