Web Design Web Design Forum
Registration is free! Here you can view your subscribed threads, work with private messages and edit your profile and preferences Calendar Find other members Frequently Asked Questions Search
Home Web Design

Convenient web based access to our favorite web design Usenet groups

web design reviews

This is Interesting: Free Magazines for Graphics designers and webmasters  





  Last Thread  Next Thread
Author
Thread Post New Thread   

character encoding?
 

T.J.




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 12:21 AM  
I upgraded my hosting the other day, and now when validating
my pages, get the following warning,

"The character encoding specified in the HTTP header (iso-8859-1)
is different from the value in the <meta> element"

Is this something to do to do with my host, and should I change the
encoding to iso-8859-1?
(The pages still validates OK)
TIA.




Post Follow-Up to this message ]
Re: character encoding?
 

Eric Jarvis




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 12:21 AM  
T.J. no1@home.invalid wrote:
>
> I upgraded my hosting the other day, and now when validating
> my pages, get the following warning,
>
> "The character encoding specified in the HTTP header (iso-8859-1)
> is different from the value in the <meta> element"
>
> Is this something to do to do with my host, and should I change the
> encoding to iso-8859-1?
> (The pages still validates OK)
> TIA.
>

Do you have a good reason for using a different encoding? If do ask your
web hosts how you go about changing the HTTP header. In general for single
language sites in English (and most Latin languages) iso-8859-1 is the
standard encoding and the best one to use. However you'll need to check
that you aren't using any characters that don't fit with it.

--
eric
www.ericjarvis.co.uk
"live fast, die only if strictly necessary"


Post Follow-Up to this message ]
Re: character encoding?
 

T.J.




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 04:17 AM  
"Eric Jarvis" <web@ericjarvis.co.uk> wrote in message
news:MPG.1c5b9608e32b612b98dc0e@news.individual.net...
> T.J. no1@home.invalid wrote: 
>
> Do you have a good reason for using a different encoding? If do ask your
> web hosts how you go about changing the HTTP header. In general for single
> language sites in English (and most Latin languages) iso-8859-1 is the
> standard encoding and the best one to use. However you'll need to check
> that you aren't using any characters that don't fit with it.
>

I have a page which is very roughly translated in to french, german,
spanish,
italian and portugese,
The Portugese page at
http://www.sim64.co.uk/pt.html
is giving an error when trying to validate it, but I just can't see where
the
error is. It might be something blindingly obvious, but I can't see it.
Before I upgraded my hosting it valdated fine.
I was using  utf-8 then, but it looks like my host is over riding this,
Could this be what is causing the problem?




Post Follow-Up to this message ]
Re: character encoding?
 

Eric Jarvis




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 09:15 AM  
T.J. no1@home.invalid wrote:
>
> "Eric Jarvis" <web@ericjarvis.co.uk> wrote in message
> news:MPG.1c5b9608e32b612b98dc0e@news.individual.net... 
>
> I have a page which is very roughly translated in to french, german,
> spanish,
> italian and portugese,
> The Portugese page at
> http://www.sim64.co.uk/pt.html
> is giving an error when trying to validate it, but I just can't see where
> the
> error is. It might be something blindingly obvious, but I can't see it.
> Before I upgraded my hosting it valdated fine.
> I was using  utf-8 then, but it looks like my host is over riding this,
> Could this be what is causing the problem?
>

Yes. Tell your hosts that you need the HTTP headers to define the encoding
as utf-8. Aske them for instructions on how you can change it, or if
that's not an option for them to change it themselves.

I can't see what the error is on the Portuguese page. I'd just cut and
paste in the dtd from one of the pages that validates and if that works
just assume it to be a subtle typo somewhere.

--
eric
www.ericjarvis.co.uk
"live fast, die only if strictly necessary"


Post Follow-Up to this message ]
Re: character encoding?
 

Norman L. DeForest




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 09:15 AM  
On Sat, 22 Jan 2005, T.J. wrote:

> "Eric Jarvis" <web@ericjarvis.co.uk> wrote in message
> news:MPG.1c5b9608e32b612b98dc0e@news.individual.net... 
>
> I have a page which is very roughly translated in to french, german,
> spanish,
> italian and portugese,
> The Portugese page at
> http://www.sim64.co.uk/pt.html
> is giving an error when trying to validate it, but I just can't see where
> the
> error is. It might be something blindingly obvious, but I can't see it.

If you use a Unicode/UTF-8-supporting text editor, the problem character
may be invisible to you.  (I just tried looking at a copy of that file
with EditPad Lite and that character is invisible when it is the
first thing in the file.)  The first three bytes in that file (immediately
ahead of the <!DOCTYPE...> declaration) are hexadecimal EF, BB, BF which
is the UTF-8 encoding for character 65279 (hexadecimal FEFF), a zero-width
joining space (which is used by some software to determine byte order but
interpreted in ISO-8859-1 as the three characters 'ï', '»', and '¿'). Try
deleting those three bytes with some other software that is *not*
UTF-8-aware and see if the file now validates.

> Before I upgraded my hosting it valdated fine.
> I was using  utf-8 then, but it looks like my host is over riding this,
> Could this be what is causing the problem?

It could be.  The first three bytes may now be messing up the
DOCTYPE declaration as the DOCTYPE is no longer the first thing in the
file when the initial reading of the file assumes that the character
encoding is ISO-8859-1:

<!DOCTYPE ....
^^^ <--- try getting rid of these.

--
Norman De Forest             http://www.chebucto.ns.ca/~af380/Profile.html
af380@chebucto.ns.ca            [=||=]            (A Speech Friendly Sit
e)
My Usenet 2005 calendar:    http://www.chebucto.ns.ca/~af380/Year-2005.txt
For explanation:   http://www.chebucto.ns.ca/~af380/Links.Books.html#TandD



Post Follow-Up to this message ]
Re: character encoding?
 

Norman L. DeForest




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 09:15 AM  
[more info, see below]

On Sat, 22 Jan 2005, Norman L. DeForest wrote:

> On Sat, 22 Jan 2005, T.J. wrote:
> 
>
> If you use a Unicode/UTF-8-supporting text editor, the problem character
> may be invisible to you.  (I just tried looking at a copy of that file
> with EditPad Lite and that character is invisible when it is the
> first thing in the file.)  The first three bytes in that file (immediately
> ahead of the <!DOCTYPE...> declaration) are hexadecimal EF, BB, BF which
> is the UTF-8 encoding for character 65279 (hexadecimal FEFF), a zero-width
> joining space (which is used by some software to determine byte order but
> interpreted in ISO-8859-1 as the three characters 'ï', '»', and '¿'). Try
> deleting those three bytes with some other software that is *not*
> UTF-8-aware and see if the file now validates.
> 
>
> It could be.  The first three bytes may now be messing up the
> DOCTYPE declaration as the DOCTYPE is no longer the first thing in the
> file when the initial reading of the file assumes that the character
> encoding is ISO-8859-1:
>
> <!DOCTYPE ....
> ^^^ <--- try getting rid of these.

Testing a copy of your page in my temp directory:

http://validator.w3.org/check?uri=h... />
p%2Fpt.htm

[snip]
: This page is not Valid [15] HTML 4.01 Transitional!
:
:    Below are the results of attempting to parse this document with an
:    SGML parser.
:     1. Line 1, column 0: character "ï" not allowed in prolog
:        »¿<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
:        [16] ✉
[snip]

Testing a copy of your page with the first three bytes removed:

http://validator.w3.org/check?uri=h.../>
p%2Fpt4.htm

[snip]
: This Page Is Valid [13] HTML 4.01 Transitional!
[snip]

--
Norman De Forest             http://www.chebucto.ns.ca/~af380/Profile.html
af380@chebucto.ns.ca            [=||=]            (A Speech Friendly Sit
e)
My Usenet 2005 calendar:    http://www.chebucto.ns.ca/~af380/Year-2005.txt
For explanation:   http://www.chebucto.ns.ca/~af380/Links.Books.html#TandD



Post Follow-Up to this message ]
Re: character encoding?
 

T.J.




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 05:15 PM  
"Norman L. DeForest" <af380@chebucto.ns.ca> wrote in message
news:Pine.GSO.3.95.iB1.0.1050122034557.908A-100000@halifax.chebucto.ns.ca...


<snip>

>
> Testing a copy of your page in my temp directory:
>
> http://validator.w3.org/check?uri=h...>
emp%2Fpt.htm
>
> [snip]
> : This page is not Valid [15] HTML 4.01 Transitional!
> :
> :    Below are the results of attempting to parse this document with an
> :    SGML parser.
> :     1. Line 1, column 0: character "ï" not allowed in prolog
> :        »¿<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> :        [16] ✉
> [snip]
>
> Testing a copy of your page with the first three bytes removed:
>
> http://validator.w3.org/check?uri=h...
emp%2Fpt4.htm
>
> [snip]
> : This Page Is Valid [13] HTML 4.01 Transitional!
> [snip]
>

Thank you,
I had seen that was the problem, but where did those first three
bytes come from and why can't I see them?
I use notepad and all I had to do was copy and paste the whole code,
without making any changes and it validates, re-load the existing page
and the problem comes back.




Post Follow-Up to this message ]
Re: character encoding?
 

Norman L. DeForest




quote this post edit post

IP Loged report this post

Old Post  01-22-05 - 05:15 PM  
On Sat, 22 Jan 2005, T.J. wrote:

> "Norman L. DeForest" <af380@chebucto.ns.ca> wrote in message
> news:Pine.GSO.3.95.iB1.0.1050122034557.908A-100000@halifax.chebucto.ns.ca...[/colo
r]
[snip] 
[snip] 
[snip] 
[snip] 
[snip]
> Thank you,
> I had seen that was the problem, but where did those first three
> bytes come from and why can't I see them?
> I use notepad and all I had to do was copy and paste the whole code,
> without making any changes and it validates, re-load the existing page
> and the problem comes back.

The three bytes, hexadecimal EF (decimal 239, 'ï'),
hexadecimal BB (decimal 187, '»'),
hexadecimal BF (decimal 191, '¿'),
are the UTF-8 encoding for a single character.
In binary with the character bits underlined:
11101111  10111011  10111111
^^^^    ^^^^^^    ^^^^^^
encodes the character hexadecimal FEFF (decimal 65279):
1111111011111111

That character is listed in the Unicode chart at:
"Arabic Presentation Forms-B" (characters FE70 to FEFF):
http://www.unicode.org/charts/PDF/UFE70.pdf

It is a space character with *zero* width.  From what I have read (if I
understood it correctly), it has two functions:

1. Some Arabic characters have more  than one form depending on whether
they are at the beginning of a word or in the middle.  If you need to
join two words visually without changing their form, you need to insert
a space between them.  Since you don't want a visible space, you use
one that is defined as having zero width.  Character U+FEFF is the
zero-width no-break space and would be the one used.

2. It is also used in software that processes the UTF-16 subset of Unicode
(storing each character 0000 to FFFF (0 to 65535) as a two-byte word)
to indicate the byte order in each word.  (Intel CPUs store the least
significant byte of a word first and Macs store the most significant
byte first).  Since hex. FFFE (65534) is guaranteed to be invalid,
FEFF can be used to flag the byte order used by the software.  This way
documents can be shared between Macs and Windows machines and each can
detect when the byte-order needs reversing for that processor.

I can only guess but I suspect that some Unicode-aware software of yours
is adding the leading byte-order indicator to any document it fetches and
stores as Unicode or UTF-8 (function 2 above).  When you view the text
with Unicode-aware software, the character is displayed *with zero width*.
(function 1 above) so it is effectively invisible.

What software are you using to fetch the page?  Would I be correct in
assuming it is Internet Explorer?[1]

If you use something like wget instead, would that fetch the correct
page?  If so, IE may be the problem.  Try the Windows version at:
"GNU wget - GNU Project - Free Software Foundation (FSF)"
http://www.gnu.org/software/wget/wget.html

The problem could also be a cache somewhere that has stored the original
buggy page.  Try clearing your browser cache to see if that's the problem.
Your ISP may also have cached the old page (but fetching it once again
with Lynx shows that it is now correct).  I also thought that if you had
your preferred encoding set to UTF-8 in your browser, the server may have
been adding the extra bytes but setting Lynx to specify UTF-8 as the
preferred document encoding didn't get the extra bytes so the problem is
more likely to be with your software.  That's where wget can be useful.

To view the file without the zero-width space being invisible, you could
try something that is *not* Unicode-aware.  (Sometime dumber software is
better for a job than alleged "smart" software.)  The LIST command in 4DOS
(formerly shareware, now freeware) can view the file as text or as
hexadecimal bytes.  4DOS has been Tip Number One on my computer tips
page long before it became freeware:
http://www.chebucto.ns.ca/~af380/Tips.html#Tip001
It's a replacement for COMMAND.COM that rivals Unix shells for
scripting and command-line capabilities.

An editor I use when I want to be able to see *everything* in a file
is NTED, my patched version of Tiny EDitor from PC Magazine.  Pressing
Alt-V allows me to see carriage-returns, line-feeds, and tabs.  Alt-B
toggles the appearance of characters 0 and 255 so they too can be
distinguished from spaces (when I'm using the default PC character set):
http://www.chebucto.ns.ca/~af380/Tips.html#TinyEd
It's only about 3000 bytes in size.  It has a limit of 64KB for file size
but larger files can always be cut into pieces.  (Look for "They Slice,
They Dice..." on my Computer Tips page.)

To view text in iso-8859-1 or CP1252, I load a CP1252 font into my
VGA card.  The font is available on my Tips page in two forms, one with
the control characters displaying as in the standard PC character set and
one that uses miniature diagonal "^A", "^B", "^C", etc. figures instead:
http://www.chebucto.ns.ca/~af380/Tips.html#Tip019
(The control-code version also displays the ISO-8859-1 non-breaking space
character (A0 hexadecimal, 160 decimal) as a miniature diagonal "BL" for
"blank" so it can be distinguished from an ASCII space, character 32
(20 hex.).

For more on Unicode and UTF-8 encoding, see:

rfc2279 -- UTF-8
http://www.cis.ohio-state.edu/htbin/rfc/rfc2279.html

"Unicode Code Charts (PDF Version)"
http://www.unicode.org/charts/


Norman "who prefers DOS utilities[2] over Windows ones" De Forest

[1] My ISP's newsletter has had an advisory against both Internet Explor
er
and Outlook/Outlook Express for some time and recommend using
alternatives to both for security reasons:
http://beacon.chebucto.info/news.shtml
[2] or Unix utilities ported to DOS:
ftp://garbo.uwasa.fi/pc/unix
--
Norman De Forest             http://www.chebucto.ns.ca/~af380/Profile.html
af380@chebucto.ns.ca            [=||=]            (A Speech Friendly Sit
e)
My Usenet 2005 calendar:    http://www.chebucto.ns.ca/~af380/Year-2005.txt
For explanation:   http://www.chebucto.ns.ca/~af380/Links.Books.html#TandD



Post Follow-Up to this message ]
Re: character encoding?
 

T.J.




quote this post edit post

IP Loged report this post

Old Post  01-23-05 - 05:25 PM  
"Norman L. DeForest" <af380@chebucto.ns.ca> wrote in message
news:Pine.GSO.3.95.iB1.0.1050122074856.15459A-100000@halifax.chebucto.ns.ca...
>
> On Sat, 22 Jan 2005, T.J. wrote:
> 
> [snip] 
> [snip] 
> [snip] 
> [snip] 
> [snip] 
>
> The three bytes, hexadecimal EF (decimal 239, 'ï'),
>                 hexadecimal BB (decimal 187, '»'),
>                 hexadecimal BF (decimal 191, '¿'),
> are the UTF-8 encoding for a single character.
> In binary with the character bits underlined:
>    11101111  10111011  10111111
>        ^^^^    ^^^^^^    ^^^^^^
> encodes the character hexadecimal FEFF (decimal 65279):
>        1111111011111111

<snipped>

Thank you for all the info,
Unfortunately it's a bit wasted on me as I don't understand
90% of it.
I appreciate your trying to help though, and the problem is solved
now.
Thanks again.




Post Follow-Up to this message ]
Re: character encoding?
 

Eric Jarvis




quote this post edit post

IP Loged report this post

Old Post  01-24-05 - 12:20 AM  
T.J. no1@home.invalid wrote:
>
> "Norman L. DeForest" <af380@chebucto.ns.ca> wrote in message
> news:Pine.GSO.3.95.iB1.0.1050122074856.15459A-100000@halifax.chebucto.ns.c
a...
> 
>
> Thank you for all the info,
> Unfortunately it's a bit wasted on me as I don't understand
> 90% of it.
>

I got most of it and learned a lot, so thanks from me too Norman. I
learned a few things that will kumin nandhi.

--
eric
www.ericjarvis.co.uk
"live fast, die only if strictly necessary"


Post Follow-Up to this message ]
Sponsored Links
 





All times are GMT. The time now is 10:14 AM. Post New Thread   
  Previous Last Thread   Next Thread next
Webmaster forum archive | Show Printable Version | Email this Page | Subscribe to this Thread

Popular forums

Adobe Photoshop forum Macromedia Flash Web Site Design
Dreamweaver FrontPage forum
JavaScript Forum XML forum
Style Sheets VRML
Forum Jump:
Rate This Thread:

 

XML RSS Feed web design latest articles Syndicate our forum via XML or simple JavaScript

Web Design archive  Database administration help  


Top Home  -  Register  -  Control Panel   -  Memberlist  -  Calendar  -  Faq  -  Search Top