How to count words in MySQL / regular expression replacer?

Multi tool use
Multi tool use


How to count words in MySQL / regular expression replacer?



How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?



I need that because, as many people, I would like to count the number of words in a field. However, I'm not satisfied with the following answer (given several times on that site):


SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table



Because it doesn't give good results when there are more that one space between two words.



By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !





Out of curiosity, why counting the words in a field?
– Peter
Nov 18 '09 at 11:34




4 Answers
4



There's REGEXP_REPLACE available as MySQL user-defined functions.



Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.





sorry to accept so late...
– PierrOz
Feb 26 '10 at 8:30



UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)



Almost a duplicate of this question but this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.



Demo



Rextester online demo



For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).



SQL (excluding function code for brevity):


SELECT txt,
-- Count the number of gaps between words
CHAR_LENGTH(txt) -
CHAR_LENGTH(reg_replace(txt,
'[[:space:]]+', -- Look for a chunk of whitespace
'^.', -- Replace the first character from the chunk
'', -- Replace with nothing (i.e. remove the character)
TRUE, -- Greedy matching
1, -- Minimum match length
0, -- No maximum match length
1, -- Minimum sub-match length
0 -- No maximum sub-match length
))
+ 1 -- The word count is 1 more than the number of gaps between words
- IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
- IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
AS `word count`
FROM tbl;



The answer is no you cannot have the same behaviour in MySQL.



But i recommend you checkout this earlier question on the subject which links to a UDF that supposedly enables some of this functionality.



MySQL 8.0 now provides a decent REGEXP_REPLACE function, which makes this much simpler:



SQL


SELECT -- Count the number of gaps between words
CHAR_LENGTH(txt) -
CHAR_LENGTH(REGEXP_REPLACE(
txt,
'[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
'$1')) -- Discard the first whitespace character and retain the rest
+ 1 -- The word count is 1 more than the number of gaps between words
- IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
- IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
AS `Word count`
FROM tbl;



Demo



DB-Fiddle online demo






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

khclyq9l,dctBUbByKmj5jXaZzV
xI,Kf1Uq680ECcQqdftI Tofr,02FqL9j8ZKjyB4y2bE St0ekaFdj9b

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications