This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Dreamweaver > August 2004 > Regex in ASP
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| budgy2 2004-08-10, 7:14 am |
| Hi there,
I am usign the followign code to strip of the html from an XML call. I then
want to remove all of the white space and put hte correct formating in. I have
tried this with a repalce statement using Chr(??) but it still does not
complete the task. (there is a commented out staement that i used below. So No
I am tyrign to use the Regex to do it, but am not sure how to wrtie the syntax
- I have done some seaching online and the itme pasted belwo was the best
example I could find!
Please help! I have to get htis finished today!
<SCRIPT RUNAT=SERVER LANGUAGE=VBSCRIPT>
function CI_StripHTML(strtext)
on error resume next
arysplit=split(strtext,"<")
if len(arysplit(0))>0 then j=1 else j=0
for i=j to ubound(arysplit)
if instr(arysplit(i),">") then
arysplit(i)=mid(arysplit(i),instr(arysplit(i),">")+1)
else
arysplit(i)="<" & arysplit(i)
end if
next
strOutput = join(arysplit, "")
strOutput = mid(strOutput, 2-j)
strOutput = replace(strOutput,">",">")
strOutput = replace(strOutput,"<","<")
CI_StripHTML = strOutput
End Function
</SCRIPT>
<% dim strPlain
strPlain=CI_StripHTML(xmlHTTP2.ResponseText)
%>
<%
Function SplitAdv(strPlain)
Dim objRE
Set objRE = new RegExp
' Set up our RegExp object
objRE.IgnoreCase = true
objRE.Global = true
objRE.Pattern = ",(?=([^']*'[^']*')*(?![^']*'))"
' .Replace replaces the comma that we will use with
' chr(8), the \b character which is extremely unlikely
' to appear in any string it then splits the line into
' an array based on the \b
SplitAdv = Split(objRE.Replace(strPlain, "\b"), "\b")
End Function
%>
<%'
strTEXT=replace(replace(replace(replace(replace(replace(replace(replace(CI_Strip
HTML(strPlain),Chr (13), "^"), Chr(32), " "), Chr(9), ""), Chr(62), ""),
Chr(20), "!$"), Chr(10), "!$"), "!$", ""), Chr(160), "") %>
<textarea readonly name="ex_plaintext" cols="100" rows="20"
id="textarea2"><%=strPlain%></textarea>
| |
| Julian Roberts 2004-08-10, 12:14 pm |
| Sounds to like you're going about this in the wrong way. If you're getting
an XML string as a response, you'd be better off using the XMLDOM to walk
through the nodes.
Something like:
set xmlDoc = createObject("MSXML2.DOMDocument")
xmlDoc.async = False
xmlDoc.loadxml(xmlHTTP2.ResponseText)
--
Jules
http://www.charon.co.uk/charoncart
Charon Cart 3
Shopping Cart Extension for Dreamweaver MX/MX 2004
| |
| budgy2 2004-08-10, 12:14 pm |
| Jules,
this is my XML call
dim xmlHTTP, xmlHTTP2
Set xmlHTTP = Server.CreateObject("Microsoft.XMLHTTP")
xmlHTTP.Open "GET", "http://ald.thinksmartmarketing.co.uk/index.asp" &
"?LOWGFX=FALSE&EMAIL=TRUE" , False
xmlHTTP.Send
any ideas how I can do what you suggested with it? I only stumbled accross the
code so don't understnad it 100% and the only replace function I ever used
befroe yesturday was the Charon Formatting extension!
Can you help?
BTW brought your cart last week. Very impressed.
Regards
Matthew.
| |
| Julian Roberts 2004-08-10, 12:14 pm |
| What do you actually want to do with the response?
> BTW brought your cart last week. Very impressed
Cool :)
--
Jules
http://www.charon.co.uk/charoncart
Charon Cart 3
Shopping Cart Extension for Dreamweaver MX/MX 2004
| |
| budgy2 2004-08-10, 12:14 pm |
| Right, sorry for the long post!
This is the code I am using:
<% dim strPlain
strPlain=CI_StripHTML(xmlHTTP2.ResponseText)
%>
<%
Function SplitAdv(strPlain)
Dim objRE
Set objRE = new RegExp
' Set up our RegExp object
objRE.IgnoreCase = true
objRE.Global = true
objRE.Pattern = ",(?=([^']*'[^']*')*(?![^']*'))"
' .Replace replaces the comma that we will use with
' chr(8), the \b character which is extremely unlikely
' to appear in any string it then splits the line into
' an array based on the \b
SplitAdv = Split(objRE.Replace(strPlain, "\b"), "\b")
End Function
%>
<%strTEXT=replace(replace(replace(replace(replace(replace(replace(replace(CI_Str
ipHTML(strPlain),Chr (13), "^"), Chr(32), " "), Chr(9), ""), Chr(62), ""),
Chr(20), "!$"), Chr(10), "!$"), "!$", ""), Chr(160), "") %>
the out put still has a load of space in it and looks like this:
^^^ALD - 12/02/2004^^^^^^//^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^
eInsight^ ^ ^ Dear "NAME", welcome to this weeks edition of
eInsight^ ^^^ ^ ^ Headlines^ ^ ^ ^ ^
^ ^^^^ ^ ^^^^ ^Smart launches two new
convertibleshttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=171^ ^
^Smart has launched two new models with the unveiling of the roadster Brabus
and roadster coup?rabus...^ ^^^^ ^ ^ ^^^^ ^BMW cyber sales success with
Manheim Auctionshttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=183^ ^
^BMW has teamed up with Manheim Auctions to roll out a series of interactive
auctions to its 150-strong retailer network, enabling them to purchase quality
used cars online from the comfort of their own office...^ ^^^^ ^ ^^^^^^^^
^^^ Model Update^^^^^^^^^^^^^ ^Lexus boosts spec of IS 200/300 range and
raises phttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=164^ ^ ^Lexus
boosts spec of IS 200/300 range and raises prices...^ ^^^^^^^^^^ ^New IS200
to spawn two-door
coupehttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=165^ ^ ^New IS200
to spawn two-door coupe...^ ^^^^^^^^^^ ^Even more luxury as new long
wheelbase Phaeton is http://ALD.thinksmartmarketing.co.u....asp?PageID=166^
^ ^Even more luxury as new long wheelbase Phaeton is launched...^ ^^^^^^^^^^
^Deliveries of new Rover 45 to start in
Mayhttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=167^ ^ ^Deliveries
of new Rover 45 to start in May...^ ^^^^^^^^^^ ^MG ZS gets new
lookhttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=168^ ^ ^A bold new
front and rear exterior appearance (read more)...^ ^^^^^^^^^^ ^Faster Audi RS
6 Plus Avant arrives in Britain
thihttp://ALD.thinksmartmarketing.co.uk/index.asp?PageID=169^ ^ ^Faster Audi
RS 6 Plus Avant
That is jsut a snipit!
I have tried all of the Chr to get rid of the space but it won't! This my
first post this morning. The formating needs to be done for a plain text email
tool I have to integrate to for a customer!
| |
| Julian Roberts 2004-08-10, 11:17 pm |
| I see. Or rather I don't see. As I don't have a solution. I can imagine it's
a nightmare to try and extract the text only from a HTML page.
--
Jules
http://www.charon.co.uk/charoncart
Charon Cart 3
Shopping Cart Extension for Dreamweaver MX/MX 2004
| |
| darrel 2004-08-10, 11:17 pm |
| > I see. Or rather I don't see. As I don't have a solution. I can imagine
it's
> a nightmare to try and extract the text only from a HTML page.
I'm confused about that too. It'd be much easier to instead just remove the
HTML tags instead of the other way around.
-Darrel
| |
| darrel 2004-08-10, 11:17 pm |
| > I am usign the followign code to strip of the html from an XML call.
Are you wanting to take XML data and have it formatted as HTML on a page? If
so, then use XSLT for that.
Or are you wanting to take XML data and spit it out as plain text? If so,
then just use a regex to remove all of the tags. (or, actually, you could
use XSLT too, but have it preserve white space and line breaks...ie, have it
output text wrapped in PRE tags or something...)
As for white space, I think we had this conversation yesterday...you need to
be more specific with us as to what SPECIFIC white space you are looking to
remove.
Actually, it'd be best if you could show us a bit of your source data, and
an example of how you want the final output formatted. That would be a huge
help for us to fully comprehend what you are trying to accomplish.
-Darrel
| |
| budgy2 2004-08-10, 11:17 pm |
| guys,
The reason for having the HTML is the page is generated through a CMS - an
HTML version is needed along side the plain text. Thus I am calling
index.asp?EMAIL=TRUE and index.asp?EMAIL=TRUE&LOWGFX=TRUE respectively. Before
stripping the HTML on the plain text version.
The Chr do a certain amount of the work, but its just the spaces that are left
that are the issue!
At my whits end! Must b a way of doing it!
regards
M
| |
| budgy2 2004-08-10, 11:17 pm |
| guys,
The reason for having the HTML is the page is generated through a CMS - an
HTML version is needed along side the plain text. Thus I am calling
index.asp?EMAIL=TRUE and index.asp?EMAIL=TRUE&LOWGFX=TRUE respectively. Before
stripping the HTML on the plain text version.
The Chr do a certain amount of the work, but its just the spaces that are left
that are the issue!
At my whits end! Must b a way of doing it!
regards
M
P.S. Thanks for all you help so far.
| |
| darrel 2004-08-10, 11:17 pm |
| > The reason for having the HTML is the page is generated through a CMS -
an
> HTML version is needed along side the plain text. Thus I am calling
> index.asp?EMAIL=TRUE and index.asp?EMAIL=TRUE&LOWGFX=TRUE respectively.
Before
> stripping the HTML on the plain text version.
OK.
> The Chr do a certain amount of the work, but its just the spaces that are
left
> that are the issue!
WHICH SPACES! ;o)
Read by other reply to this thread. I think you can approach this from a
different direction.
-Darrel
| |
| darrel 2004-08-10, 11:17 pm |
| Actually, I think I know what you want to do now.
It sounds like you want to take HTML formatted content, remove all the HTML,
but preserve line breaks so that it is in essence, a plain text email,
correct?
If so, then I think regex is the solution. Get the text and attach it to a
string. Then perform a few regex replaces on it
- find all </p>, </h*> tags and replace with VBCRLF & VBCRLF
(this will make sure headers and paragraphs have enough space to delimit
them in the plain text...you might want to replace other tags at this point
too)
- find all HTML tags that are left and replace them with nothing
- find all instances of 2 or more spaces and replace with one space
-Darrel
| |
| budgy2 2004-08-10, 11:17 pm |
| Those that are not removed by the replace statement I have already written. see thread.
whats ur idea?
| |
| darrel 2004-08-10, 11:17 pm |
| > Those that are not removed by the replace statement I have already
written. see thread.
Budgy...are you reading this thread via the web forums? You seem to be out
of synch with the replies. (if you are, I'd strongly suggest you switch to
an actual news reader instead of using the web forum...which is horrificly
slow and cumbersome).
As for which spaces, you're not being specific enough. All spaces? If so,
that'd just leave you with one giant, REALLY long word. I assume you want to
remove all instances of 2 or more contiguous spaces, correct? Also, I'm not
sure where all of the "^^^^^^^^^^^^^^^^^" characters are coming from. I
assume you want to remove those too? Or are those supposed to be in there?
Again, the EASIEST way for us to help you is for you to show us what a bit
of the source HTML looks like and a sample of how you want it to look like
in the end.
Read the reply I posted at 3:36. That outlines my idea. Here's the copy of
that:
========================================================
Actually, I think I know what you want to do now.
It sounds like you want to take HTML formatted content, remove all the HTML,
but preserve line breaks so that it is in essence, a plain text email,
correct?
If so, then I think regex is the solution. Get the text and attach it to a
string. Then perform a few regex replaces on it
- find all </p>, </h*> tags and replace with VBCRLF & VBCRLF
(this will make sure headers and paragraphs have enough space to delimit
them in the plain text...you might want to replace other tags at this point
too)
- find all HTML tags that are left and replace them with nothing
- find all instances of 2 or more spaces and replace with one space
-Darrel
| |
| darrel 2004-08-13, 7:16 pm |
| > The reason for having the HTML is the page is generated through a CMS -
an
> HTML version is needed along side the plain text. Thus I am calling
> index.asp?EMAIL=TRUE and index.asp?EMAIL=TRUE&LOWGFX=TRUE respectively.
Before
> stripping the HTML on the plain text version.
OK.
> The Chr do a certain amount of the work, but its just the spaces that are
left
> that are the issue!
WHICH SPACES! ;o)
Read by other reply to this thread. I think you can approach this from a
different direction.
-Darrel
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|