This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Webmaster forum > November 2006 > Remove any non-word





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Remove any non-word
Karl Groves

2006-11-19, 7:57 pm

Before I write my own, I'm wondering if anyone has something already
written that removes everything not a word from a string. By "not a word",
I mean HTML, special characters, and punctuation. Digits allowed.

TIA

--
Karl Groves
www.karlcore.com
Chris F.A. Johnson

2006-11-19, 7:57 pm

On 2006-11-12, Karl Groves wrote:
> Before I write my own, I'm wondering if anyone has something already
> written that removes everything not a word from a string. By "not a word",
> I mean HTML, special characters, and punctuation. Digits allowed.


Presuming a *nix OS, with bash or ksh93, to remove all characters
that are not letters or numbers:

newstring=${string//[!a-zA-Z0-9]/}

If you want to remove all tags, as well, use sed:

printf "%s\n" "$str" | sed -e 's/<[^<]*>//g' -e 's/[^a-zA-Z0-9]*//g'


--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews