This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Microsoft XML > May 2004 > XSLT - Transforming an HTM Table
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
XSLT - Transforming an HTM Table
|
|
|
| Hi all,
I am trying to Transform an already parsed & converted HTM file (which
is XML Valid!) into my own desired XML output. The trouble I am having is
with the <TABLE> element. Since this is linear in my source document, but I
require the XSLT to transform it into a Hierarchical Structure in XML.
Example - HTM (XML Valid)
<TABLE CLASS="MAIN">
<TR><TD><B>A - First</B></TD></TR>
<TR><TD>1</TD><TD>Number 1</TD></TR>
<TR><TD>2</TD><TD>Number 2</TD></TR>
<TR><TD>3</TD><TD>Number 3</TD></TR>
<TR><TD><B>A1 - Child of First</B></TD></TR>
<TR><TD>4</TD><TD>Number 4</TD></TR>
<TR><TD>5</TD><TD>Number 5</TD></TR>
<TR><TD><B>End A1</B></TD></TR>
<TR><TD>6</TD><TD>Number 6</TD></TR>
<TR><TD><B>End A</B></TD></TR>
<TR><TD><B>B - Second</B></TD></TR>
<TR><TD>7</TD><TD>Number 7</TD></TR>
<TR><TD><B>B1 - Child of Second</B></TD></TR>
<TR><TD>8</TD><TD>Number 8</TD></TR>
<TR><TD>9</TD><TD>Number 9</TD></TR>
<TR><TD><B>End B1</B></TD></TR>
<TR><TD><B>B2 - Child of Second</B></TD></TR>
<TR><TD>10</TD><TD>Number 10</TD></TR>
<TR><TD><B>B2a - Child of Child of Second</B></TD></TR>
<TR><TD>11</TD><TD>Number 11</TD></TR>
<TR><TD><B>End B2a</B></TD></TR>
<TR><TD>12</TD><TD>Number 12</TD></TR>
<TR><TD><B>End B2</B></TD></TR>
<TR><TD>13</TD><TD>Number 13</TD></TR>
<TR><TD><B>End B</B></TD></TR>
</TABLE>
The Output I require in XML is similar to;
<MAIN>
<SEQUENCE ID="A">
<NUMBER ID="1">Number 1</NUMBER>
<NUMBER ID="2">Number 2</NUMBER>
<SEQUENCE ID="A1">
<NUMBER ID="3">Number 3</NUMBER>
<NUMBER ID="4">Number 4/NUMBER>
<NUMBER ID="5">Number 5</NUMBER>
</SEQUENCE>
<NUMBER ID="6">Number 6</NUMBER>
<SEQUENCE>
<SEQUENCE ID="B">
<NUMBER ID="7">Number 7</NUMBER>
<SEQUENCE ID="B1">
<NUMBER ID="8">Number 8</NUMBER>
<NUMBER ID="9">Number 9</NUMBER>
</SEQUENCE>
<SEQUENCE ID="B2">
<NUMBER ID="10">Number 10</NUMBER>
<SEQUENCE ID="B2a">
<NUMBER ID="11">Number 11</NUMBER>
</SEQUENCE>
<NUMBER ID="12">Number 12</NUMBER>
</SEQUENCE>
<NUMBER ID="13">Number 13</NUMBER>
</SEQUENCE>
</MAIN>
As you can see the Sequences are in a hierarchy according to the HTM table.
Can anyone provide any XSL that will transform this.
Many thanks in advance
b0yce
| |
|
| This is the stylesheet I have currently:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<xsl:element name="MAIN">
<xsl:apply-templates select="//TABLE[@CLASS='MAIN']" />
</xsl:element>
</xsl:template>
<xsl:template match="TABLE">
<!-- Only pick out Rows that are starting Sequences -->
<xsl:apply-templates select="TR[count(TD)=1][contains(TD/B, 'End') =
false][substring(TD/B, 2, 1) = ' ']" />
</xsl:template>
<xsl:template match="TR[count(TD)=1][contains(TD/B, 'End') = false]">
<!-- Match any beginning of Sequence -->
<xsl:element name="SEQ">
<xsl:variable name="SeqID"><xsl:value-of select="substring-before(TD/B, '
')" /></xsl:variable>
<xsl:attribute name="ID"><xsl:value-of select="$SeqID" /></xsl:attribute>
<xsl:variable name="LastChild">
<xsl:for-each select="following::TR">
<xsl:if test="TD/B = concat('End ', $SeqID)">
<xsl:value-of select="position()" />
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:for-each select="following::TR[position() < $LastChild]">
<xsl:choose>
<xsl:when test="count(TD) = 1">
<!-- Sub Sequence -->
<xsl:apply-templates select="self::TR[count(TD)=1][contains(TD/B,
'End') = false]" />
</xsl:when>
<xsl:otherwise>
<!-- Child -->
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
and it produces:
<?xml version="1.0" encoding="UTF-16"?>
<MAIN>
<SEQ ID="A">
<SEQ ID="A1"/>
</SEQ>
<SEQ ID="B">
<SEQ ID="B1"/>
<SEQ ID="B2">
<SEQ ID="B2a"/>
</SEQ>
<SEQ ID="B2a"/> <-- This shouldn't be here!! -->
</SEQ>
</MAIN>
as you can see I have an extra Sequence B2a in the wrong place (marked
above). So can anyone help with the correct looping through the table?
Cheers again!
| |
|
| I managed a solution to my problem.
It is based on the assumption (which is *always* true in the case of the
source Table) that;
a) A Group will start with a letter followed by some text.
b) A Child Group will be incrementally named so child of A is A1, then A1a,
then A1a1 etc... The depth is indicated by length of Group ID.
c) A group is finished by "End " followed by it's Group ID.
Thanks if anyone looked into this, I posted it to serve as a reference to
anyone who wants to recursively loop through something. Maybe could be
useful, even though this is quite specific to my needs.
You can follow the above logic to nth degree and it should work, so however
deep you want to go!
b0yce
XSL Stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:element name="MAIN">
<xsl:apply-templates select="//TABLE[@CLASS='MAIN']"/>
</xsl:element>
</xsl:template>
<xsl:template match="TABLE">
<!-- Only pick out Rows that are starting Sequences -->
<xsl:apply-templates select="TR[count(TD)=1][contains(TD/B, 'End') =
false][substring(TD/B, 2, 1) = ' ']"/>
</xsl:template>
<!-- Start of Sequence Match Template -->
<xsl:template match="TR[count(TD)=1][contains(TD/B, 'End') = false]">
<!-- Match any beginning of Sequence -->
<xsl:element name="SEQ">
<!-- Get Sequence Identifier -->
<xsl:variable name="SeqID" select="substring-before(TD/B, ' ')"/>
<!-- Get the Position of Last Child -->
<xsl:variable name="LastChild">
<xsl:for-each select="following::TR">
<xsl:if test="TD/B = concat('End ', $SeqID)">
<xsl:value-of select="position()"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<!-- Set the Attribute of the Sequence -->
<xsl:attribute name="ID"><xsl:value-of select="$SeqID"/></xsl:attribute>
<!-- Loop through all child rows -->
<xsl:for-each select="following::TR[position() < $LastChild]">
<!-- Decide if it's a Subsequence or a Child -->
<xsl:choose>
<xsl:when test="count(TD) = 1">
<!-- Sub Sequence -->
<xsl:if test="string-length(substring-before(TD/B, ' ')) -
string-length($SeqID) = 1">
<!-- Apply the Template only if it is a Start Sequence -->
<xsl:apply-templates select="self::TR[count(TD)=1][contains(TD/B,
'End') = false]"/>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<!-- Child -->
<!-- Get the Number of Start Sequences Remaining -->
<xsl:variable name="NextStartCount"
select="count(following::TR[TD/B][contains(TD/B, 'End') = false])" />
<!-- Get the Number of End Sequences Remaining -->
<xsl:variable name="NextEndCount"
select="count(following::TR[TD/B][contains(TD/B, 'End')])" />
<!-- Now check if Field is in correct sequence -->
<!-- End - Start will equal length of Sequence Identifier -->
<xsl:if test="($NextEndCount - $NextStartCount) =
string-length($SeqID)">
<!-- It is a child of this Sequence -->
<xsl:element name="FIELD">
<xsl:attribute name="ID"><xsl:value-of
select="TD[1]"/></xsl:attribute>
<xsl:value-of select="TD[2]"/>
</xsl:element>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|