Posted: Sun Jan 24, 2010 11:10 pm Post subject: Encoding for ".xls" files
I am asking this question here, because I would like to give a webmaster an answer that he can use to make his website more Mac/Neooffice friendly.
Bajaflora.org serves data from the San Diego Natural History Museum Herbarium. Not all of the site is currently publicly available, but I hope that will change soon.
If I ask it to download some data, I get an ".xls" file. I am not sure if this is really a ".xls" file or in some other format with an ".xls" extension. See attached.
It appears to be in UTF8, because if I open it in TextMate, the non-ASCII characters are interpreted correctly (eg, San Pedro Mártir). But if I open it in Neooffice, the non-ASCII characters are "all kerflooey".
What I need to know is, whose problem is this, and can it be easily solved (like with a Byte Order Marker in the file, or a different extension)? I would like to tell the webmaster of Bajaflora.org "please do the following, so that the downloaded files can be easily opened by platforms other than Office/Windows."
Posted: Sun Jan 24, 2010 11:38 pm Post subject: Re: Encoding for ".xls" files
alanterra wrote:
If I ask it to download some data, I get an ".xls" file. I am not sure if this is really a ".xls" file or in some other format with an ".xls" extension. See attached.
It appears to be in UTF8, because if I open it in TextMate, the non-ASCII characters are interpreted correctly (eg, San Pedro Mártir). But if I open it in Neooffice, the non-ASCII characters are "all kerflooey".
Your file is definitely not an Excel file. Instead, it is a UTF-8 encoded partial HTML file. Specifically, it contains an HTML table of data and cells are encoded in UTF-8.
All that needs to be changed in this file to open it in Safari or Firefox is to change the name of the file extension to .html and the missing the missing HTML tags at the beginning and ending of the file.
A renamed and edited version of your file is attached.
Posted: Mon Jan 25, 2010 12:09 am Post subject: Re: Encoding for ".xls" files
alanterra wrote:
If I ask it to download some data, I get an ".xls" file. I am not sure if this is really a ".xls" file or in some other format with an ".xls" extension. See attached.
Right, it's part of an HTML document (an HTML table) with an .xls extension.
alanterra wrote:
What I need to know is, whose problem is this, and can it be easily solved (like with a Byte Order Marker in the file, or a different extension)? I would like to tell the webmaster of Bajaflora.org "please do the following, so that the downloaded files can be easily opened by platforms other than Office/Windows."
My guess is that there's a bug in the underlying OOo HTML import code where it doesn't do any character set detection and always assumes the character set is something other than UTF-8.
Luckily, it's easy enough to fix: just add a meta tag with charset information to the beginning of the HTML fragment; I added one and the document then imported just fine (trinity really does not like me trying to post the content of the meta tag, even without the angled brackets...grr I've spent the last 20 minutes trying to do so, with absolutely no success ).
Posted: Mon Jan 25, 2010 7:12 am Post subject: Re: Encoding for ".xls" files
sardisson wrote:
My guess is that there's a bug in the underlying OOo HTML import code where it doesn't do any character set detection and always assumes the character set is something other than UTF-8.
I don't think you can blame NeoOffice's underlying OpenOffice.org code for this. HTML files have a very standard way to specify what character set is. Our website's pages use that standard way but the attached file did not have any of the standard HMTL "html", "head", or "body" tags. In other words, the HTML was very incomplete.
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum