PHP 8.5.0 Alpha 4 available for testing

str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_countCuenta el número de palabras utilizadas en un string

Descripción

str_word_count(string $string, int $format = 0, ?string $characters = null): array|int

str_word_count() cuenta el número de palabras en el string string. Si el argumento opcional format no está especificado, entonces el valor devuelto será un integer, representando el número de palabras encontradas. Si format está especificado, el valor devuelto será un array, que depende del formato format. Los valores posibles para format se listan a continuación.

En esta función, la noción de palabra depende de la configuración de la configuración local. Es un string que contiene todos los caracteres alfabéticos, y que puede contener, pero no comenzar por "'" y "-". Cabe señalar que las configuraciones locales multioctetos no están soportadas.

Parámetros

string

El string

format

Especifica el valor de retorno de esta función. Los valores actualmente soportados son:

  • 0: devuelve el número de palabras encontradas
  • 1: devuelve un array que contiene todas las palabras encontradas dentro de string
  • 2: devuelve un array asociativo, donde la clave indica la posición numérica de la palabra dentro de string y el valor es la palabra actual

characters

Una lista de caracteres adicionales que serán considerados como una palabra

Valores devueltos

Devuelve un array o un integer, dependiendo del format elegido.

Historial de cambios

Versión Descripción
8.0.0 characters ahora es nullable.

Ejemplos

Ejemplo #1 Ejemplo con str_word_count()

<?php

$str
= "Salut l'ami, vous
avez une b3lle mine !"
;

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo
str_word_count($str);

?>

El ejemplo anterior mostrará :

Array
(
    [0] => Salut
    [1] => l'ami
    [2] => vous
    [3] => avez
    [4] => une
    [5] => b
    [6] => lle
    [7] => mine
)

Array
(
    [0] => Salut
    [6] => l'ami
    [13] => vous
    [27] => avez
    [41] => une
    [45] => b
    [47] => lle
    [51] => mine
)

Array
(
    [0] => Salut
    [1] => l'ami
    [2] => vous
    [3] => avez
    [4] => une
    [5] => b3lle
    [6] => mine
)

8

Ver también

  • explode() - Divide una string en segmentos
  • preg_split() - Divide una cadena mediante expresión regular
  • count_chars() - Devuelve estadísticas sobre los caracteres utilizados en un string
  • substr_count() - Cuenta el número de ocurrencias de segmentos en un string

add a note

User Contributed Notes 30 notes

up
39
cito at wikatu dot com
13 years ago
<?php

/***
* This simple utf-8 word count function (it only counts)
* is a bit faster then the one with preg_match_all
* about 10x slower then the built-in str_word_count
*
* If you need the hyphen or other code points as word-characters
* just put them into the [brackets] like [^\p{L}\p{N}\'\-]
* If the pattern contains utf-8, utf8_encode() the pattern,
* as it is expected to be valid utf-8 (using the u modifier).
**/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
return
count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>
up
17
splogamurugan at gmail dot com
16 years ago
We can also specify a range of values for charlist.

<?php
$str
= "Hello fri3nd, you're
looking good today!
look1234ing"
;
print_r(str_word_count($str, 1, '0..3'));
?>

will give the result as

Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )
up
2
Adeel Khan
17 years ago
<?php

/**
* Returns the number of words in a string.
* As far as I have tested, it is very accurate.
* The string can have HTML in it,
* but you should do something like this first:
*
* $search = array(
* '@<script[^>]*?>.*?</script>@si',
* '@<style[^>]*?>.*?</style>@siU',
* '@<![\s\S]*?--[ \t\n\r]*>@'
* );
* $html = preg_replace($search, '', $html);
*
*/

function word_count($html) {

# strip all html tags
$wc = strip_tags($html);

# remove 'words' that don't consist of alphanumerical characters or punctuation
$pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
$wc = trim(preg_replace($pattern, " ", $wc));

# remove one-letter 'words' that consist only of punctuation
$wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

# remove superfluous whitespace
$wc = preg_replace("/\s\s+/", " ", $wc);

# split string into an array of words
$wc = explode(" ", $wc);

# remove empty elements
$wc = array_filter($wc);

# return the number of words
return count($wc);

}

?>
up
1
manrash at gmail dot com
16 years ago
For spanish speakers a valid character map may be:

<?php
$characterMap
= 'áéíóúüñ';

$count = str_word_count($text, 0, $characterMap);
?>
up
1
uri at speedy dot net
12 years ago
Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php

function count_words($string) {
// Return the number of words in a string.
$string= str_replace("&#039;", "'", $string);
$t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
$string= str_replace($t, " ", $string);
$string= trim(preg_replace("/\s+/", " ", $string));
$num= 0;
if (
my_strlen($string)>0) {
$word_array= explode(" ", $string);
$num= count($word_array);
}
return
$num;
}

function
my_strlen($s) {
// Return mb_strlen with encoding UTF-8.
return mb_strlen($s, "UTF-8");
}

?>
up
1
brettNOSPAM at olwm dot NO_SPAM dot com
22 years ago
This example may not be pretty, but It proves accurate:

<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>

Hope I didn't miss any punctuation. ;-)
up
0
php dot net at salagir dot com
7 years ago
This function doesn't handle accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo
str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
return
count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo
str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
$a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
return
count($a);
}
echo
str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>
up
0
dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64)
14 years ago
to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
$text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
$text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
$text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
$len = strlen($text);

if (
0 === $len) {
return
0;
}

$words = 1;

while (
$len--) {
if (
' ' === $text[$len]) {
++
$words;
}
}

return
$words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string
up
0
brettz9 - see yahoo
15 years ago
Words also cannot end in a hyphen unless allowed by the charlist...
up
0
charliefrancis at gmail dot com
16 years ago
Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
if(
strlen( $text ) > $limit ) {
$words = str_word_count( $text, 2, $chars );
$words = array_reverse( $words, TRUE );
foreach(
$words as $length => $word ) {
if(
$length + strlen( $word ) >= $limit ) {
array_shift( $words );
} else {
break;
}
}
$words = array_reverse( $words );
$text = implode( " ", $words ) . '&hellip;';
}
return
$text;
}

$str = "Hello this is a list of words that is too long";
echo
'1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo
'2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words
up