Joined: Oct 31, 2005 Posts: 56 Location: Victoria BC Canada
Posted: Sun Nov 25, 2007 4:22 am Post subject: successul mixing NeoO and KompoZer HTML code
I originally posted this to the KompoZer (Open Source WYSIWYG app) forum and am cross-posting it here for anyone interested.
I've discovered that the HTML format of OpenOffice (or NeoOffice for Mac, which is what I use) imports nicely into KompoZer. I particularly like that when I have set up hyperlinks from a footnote or endnote superscript, which jumps to the proper note or reference, the same action is preserved in the HTML "Save As..." (Not "Export" which in OOo/NeoO only gives the option to generate an xhtml file). The saved-as file is HTML 4.0 and has inline CSS styles.
When the OOo generated HTML file is opened in KompoZer, one can then proceed to manipulate it (for example pull in the margins, which by default will run to the end of the browser window). This is much better than saving text from one's word processor and then having to do a bunch of formatting; the extended characters are already coded properly, as well as the code for hyperlinking footnotes. Therefore for someone like me who wishes to convert a word processing document to HTML and have all the formatting intact (and not have to struggle with the nasty code that Word either puts on the clipboard or inserts in a generated HTML file), this is a real time-saver.
And, here is possibly the best part: the OOo/NeoO file's inline cascading style sheet (even after making changes in KompoZer) exports nicely to an external style sheet using CaScadeS's stylesheet exporting option.
And, here is a small catch-22: all the code from OOo/NeoO and any modifications subsequently made in KompoZer validate except one single bit in the footnote linking, an attribute called "sdfixed" that the validator says is "proprietary" (no useful information found on Google). Nonetheless the hyperlink anchors work for me in both directions on FireFox, Opera and Safari. I don't have a Windows machine to test that function in IE, so I'd like to ask somebody to test the file for me in MS IE on a Win box and see if the footnote hyperlink works. There is only one such link in the sample file, a superscripted number 1 immediately after the word "Culture" in the document's head title. It is at:
http://members.shaw.ca/vicjoe/pub/culture-no_explan_power.html
Feel free to look at the source code, too. If the footnote hyperlink works in IE, I'll be a happy camper and this tip may be hugely useful to others as well, as this obviates the need to learn about CSS, at least for formatted/converted word processing documents. Note that all the "special characters" like curly quotes and em dashes came through, too. Nice!
Not completely related, but I've often used Neo's Writer/Web to clean up HTML I've gotten from other sources (e.g., exported from MS Office), which has saved tons of time compared to pulling out the chaff manually....
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Joined: Oct 31, 2005 Posts: 56 Location: Victoria BC Canada
Posted: Sun Nov 25, 2007 12:39 pm Post subject: cleaning MS Office HTML junk
Quote:
Not completely related, but I've often used Neo's Writer/Web to clean up HTML I've gotten from other sources (e.g., exported from MS Office), which has saved tons of time compared to pulling out the chaff manually
That's interesting; I gather what you are saying is that Neo's Writer/Web isn't simply acting as a browser when it opens a file from somewhere, rather it is re-coding it and in the process getting rid of 'mso-normal' and like junk? I'll have to try it next time I get a file someone has saved in HTML format from Office. One more reason to use Ooo/Neo and dump MS Orifice.
It doesn't surprise me that OOo is rewriting the HTML as I believe the path is HTML gets translated into the internal document format in memory and then is saved back to HTML through a filter.
Joined: Oct 31, 2005 Posts: 56 Location: Victoria BC Canada
Posted: Thu Nov 29, 2007 3:29 am Post subject: OOo HTML generator invokes IE7 failure
I ran into a problem with my OOo/NeoO saved HTML further refined in KompoZer, namely that some of the CSS attributes, most grievously the margin specification, was being ignored by MS IE 7 (and possibly IE 6).
What I discovered was, though my test file displayed generally okay in 10 major browsers, it always looked awful in IE, especially in that the margins would run to the far right edge of the browser window.
Since MS browsers still constitute 65% of overall use, I was dismayed to say the least. To make a long story of hunting around to find out what IE was doing, er, differently (and I found lots on that, but that is another story), what it came down to is that the DOCTYPE generated by the OOo HTML generator was:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
Not close enough for IE, which would kick into a particularly unforgiving "quirks mode" and ignore CSS attributes.
So, to use my little OOo-NeoO/KompoZer trick and expect it to display properly in IE, I determined that one must change the DOCTYPE to this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"> (assuming one wants to use 4.01 transitional).
I don't know about other DOCTYPEs, but I'd assume the same, that it has to be character for character correctly rendered as set forth by the W3C consortium.
Of course IE will recognize a completely non-standard DOCTYPE generated by any MS Office app, but that is to be expected. For example, Word 9 generates this gem:
xmlns="http://www.w3.org/TR/REC-html40">
Conclusion: with the above correction, a combo of OOO/NeoO export and refinement in KompoSer works fairly well by applying the above DOCTYPE cure. Once this has all been incorporated into a work flow, it saves a lot of time (particularly since at this stage numerous of the coding features of KompoZer's CSS editor, CaScadeS, do not work in the Mac version).
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum