Posted: Thu Nov 04, 2004 3:17 am Post subject: Arabic
Howdy,
Downloaded NeoOffice/J yesterday and started using it right away on my Powerbook G4, 400mhz, running 10.2.8, and custom Arabic keyboards installed.
When writing in Arabic in the document, the letters wouldn't join. I tried Riwaj, TITUS Cyberbit, Arabic Typesetting, the PakType fonts, and a host of others, including Devalipi fonts. Funny thing is, only the devalipi letters would join properly.
I saved to a pdf to see, and same issue.
HOWEVER, I "saved as" to html, got some notice saying certain characters weren't processed, but a html file was output. When I opened that in Firefox, well whaddayaknow. The letters for all of the fonts were now joined properly, except the devalipi ones!
Any input as to what this issue is, and how I might reliably write in Arabic?
Thanks!
The problem with the letters not being joined is that (AFAIK) NeoJ uses native Mac OS X font-rendering routines, which seem not to support joining Arabic characters in Windows TTFs. Alternatively, I have seen an explanation somewhere which says Windows TTFs are missing some "resource" or marker to enable Apple's routines to properly join them. The exact cause, or which argument is correct, is beyond my knowledge. [Edit: there's a bit more info at this thread in Apple Discussions.]
Using Apple's Arabic fonts (Geeza Pro, DecoType Naskh, Baghdad, etc.), NeoJ has no problem properly joining the letters (at least on 10.3.x).
(OOo X11 does join the letters in the Windows TTFs because it uses FreeType for rendering instead; I imagine something similar is true for Firefox--its cross-platform/Unix background enables it to render Windows Arabic TTFs properly (although Firefox only seems to support Geeza Pro and not any of the other Mac Arabic fonts!). Aside from these two apps, I've yet to find a Mac app which properly joins the letters in Windows TTFs--and these to do it because they're only "partially" native. The price of a nice, OS X-integrated native app is apparently loss of compatability with Windows Arabic TTFs )
Shalkar wrote:
HOWEVER, I "saved as" to html, got some notice saying certain characters weren't processed, but a html file was output.
This is because the default charset for HTML export is set to ISO-8859-1 (at least if the default OS language/locale is US English), and of course Arabic requires ISO-8859-6 (or the DOS or Windows codepages) or UTF-8. In most cases the export will be successful and be rendered by most browsers, but I did once have to change to UTF-8 to make a few of the Turkish letters in an otherwise English document display.
The default charset can be changed in Tools>Options>Load/Save>HTML Compatibility.
I'm not terribly familiar with the Devalipi fonts (I recognize the name, but don't have them--are they free/shareware?); it's possible that they have a "Mac only" encoding that NeoJ can handle since it uses Apple's routines but Firefox cannot. What happens when you load the page in Safari? (You might want to try both with the current page and after re-exporting it with the proper charset selected.)
Hope this helps a little. We have some other members here who, unlike me, are native speakers and are likely more familiar with the quirks of Arabic under Mac OS X....
Smokey
Last edited by sardisson on Thu Nov 04, 2004 4:31 pm; edited 1 time in total
Thanks Smokey for the detailed explanation. I was struggling to explain this.
FYI. I did notice, however, that I introduced a bug in "patch-3" that causes some beginning-form Arabic characters to get swapped with the space immediately preceding it. I have fixed this particular bug and the fix will be in the next Neo/J Alpha 2 patch.
Thanks for the feedback. The html bit makes sense. So do the font-rendering routines section. I would use the Mac Arabic fonts but they do not have certain characters requried in the language I work in, Kazak (similar to uighur and Kirghiz), as far as I know.
I am working too with OO, 1.1.2, using xfree86 and OroboOSX v 0.9, but not Apple's X11 since that doesn't work with 10.2.8. I haven't been successful though with OO to get arabic fonts to work correctly. I'll head over to those forums to cast for help.
The Devalipi fonts are part of the shareware apps found at www.devalipi.com
I will try using Safari and get back to you. (I trash my test files rather quickly otherwise I'd have hundreds in a short period of time).
Where else would you recommend I read more about charsets? I am just getting familir with unicode, and this phrase keeps popping up. Is there NeoOffice/J specifics sites I could try?
I would use the Mac Arabic fonts but they do not have certain characters requried in the language I work in, Kazak (similar to uighur and Kirghiz), as far as I know.
Check Geeza Pro if you haven't already, using either the Character Palette or something like Unicode Font Info or UnicodeChecker. I know it includes some more obscure Arabic letters/hamza combos than the rest of the Arabic fonts, so it has a larger set in general, but I don't know what's needed for Kazakh.
Shalkar wrote:
I am working too with OO, 1.1.2, using xfree86 and OroboOSX v 0.9, but not Apple's X11 since that doesn't work with 10.2.8. I haven't been successful though with OO to get arabic fonts to work correctly.
It shouldn't be OOo that's the problem--at least, it certainly works under 10.3 and Apple X11 (with a number of little quirks, mostly keyboard shortcuts and some deadkeys not working). I spent some time during the testing cycle making sure Arabic worked to at least a useable level.
Shalkar wrote:
The Devalipi fonts are part of the shareware apps found at www.devalipi.com
Ah, yes, the Arabic Genie folks. Now I remember. Well, I was unable to get those fonts to work correctly in NeoJ (or anywhere outside of their own Devalipi Pro application)--not even in other Carbon apps. Actually, text I wrote in Devalipi Pro and exported to HTML displayed properly in Safari and Firefox, but that's because DP exported the file as "MacRoman" encoding and referenced the DevalipiA font. They explicitly call the positions where the appropriate forms of the letters are in their font (if you were to add a space in the middle of a word in TextEdit and then load again in Safari, there would be medial forms on either side of the space).
The problem with the Devalipi fonts is that they are based on some pre-Unicode, non-standard encoding. Basically, all the glyphs are in the wrong positions. Pre-Unicode fonts were limited to 256 glyphs/characters, when someone wanted a font for a non-Roman script on any OS, they (generally) "replaced" all the Roman glyphs with the ones needed for their script. Sometimes they followed an existing encoding standard--aka charset when we arrive at the internet--(there were ASMO encodings that were early Arabic standards), but more often than not at some logic only they knew.
[A charset is roughly synonymous with encoding and is the group of characters needed by a particular script (Roman, Arabic, Cyrillic, etc.) and the order in which they appear in a font and the decimal or hex code for the character that gets inserted into a document on a "behind-the-scenes" level.]
For example, "A" is at position 65 (decimal) in a standard Roman font (pre/post-Unicode)--for sake of argument, the 65th character in the font. When making an Arabic font, someone might put the alif in at 65, the baa' at 66, and so forth. When something written in this font was sent to another computer, the Arabic text would be Roman gibberish without the special font. For all intents and purposes, it was not an Arabic text.
ISO-8859-6, the international standard charset/encoding that predated Unicode, and on which Apple's Arabic fonts were based, assigned alif position 199 in a font (because this encoding also included the ASCII Roman characters in their normal positions--bilingual/bi-script fonts!). The problem with the ISO-8859 encodings is that they only allowed two scripts at a time, Western European Roman (missing chars for Polish, for instance) and the non-Roman script. One couldn't easily write in Arabic and Hebrew, for instance, in the same document.
[On the Mac one could, by switching keyboard layouts (which switched scripts and fonts and put an invisible marker in the document saying 199 in this section is alif but in this other section it means hebrew-character-x), but the document couldn't be exchanged widely.]
Enter Unicode, which put each character in "every" language and script in a separate position in the encoding table and thus the fonts. No more possible confusion. Alif is 1517; no other character can have spot 1517.
Under Mac OS X, Apple's letter-joining routines expect letters to be in their Unicode positions in order for the OS to do the contextual analysis and pick the right form, etc. That's why even the old ISO-8859-6-based Arabic fonts no longer work; they don't have a 1517 (which is what an Arabic keyboard layout looks for when one presses "h"). One can still "type" the Arabic letters from the old font using a Roman keyboard, but the OS doesn't think it's Arabic, doesn't do contextual analysis, and doesn't join the letters. It's like the guy who made the Arabic font by putting alif in position 65 in the old, old days--the text is essentially gibberish.
In DevalipiA, alif is 71! If they had designed their fonts (and input system) to follow ISO-8859-1, there would at least have been a migration path (Text Encoding Converter) which could make the texts into useable Arabic on which the OS can perform contextual analysis and display the correct letter forms....
Unfortunately, on the long road to Unicode (and multi-lingual computing before that), these non-standard fonts, and applications based around them, gained a lot of traction. Microsoft used to make people buy a special version of Windows and a special version of Word to work in Arabic. Apple was a bit better, with WorldScript on the system level that every application could link to and gain multilingual, mulit-script support, and then only the Arabic Language Kit for end-users to add the Arabic-specific pieces (i.e., to "turn on" the latent Arabic support); ALK was $100 but it lasted, no new purchase needed, from System 7 until Mac OS 9 included the language kits by defaul with the OS for free. But these were still obstacles of varying degrees, and some companies thought they could do better for cheaper by re-inventing the wheel in a non-standard way.... (Sorry, this became a pet peeve of mine in the 90s )
Shalkar wrote:
Where else would you recommend I read more about charsets? I am just getting familir with unicode, and this phrase keeps popping up. Is there NeoOffice/J specifics sites I could try?
Oops. My explanation got a little long-winded and went into a fair amout of that. Aside from reading the OOo/NeoJ helpfiles, if they have anything, there's nothing too specific to NeoJ/OOo--they support documents in Unicode and some of the legacy ISO encodings/charsets. But there are a couple of sites for more general info: Alan Wood's Unicode Resources on Unicode in general (although he's also a Mac user) and Tom Gewecke's Unleash Your Multilingual Mac; Tom's the dean of the multilingual Mac knowledge these days. And of course the original, Knut Vikør's The Arabic Mac, which is not quite as up-to-date as it once was.
Going back to the Windows TTFs, there are two avenues where we need to lobby in order to get them working on Mac OS X: Apple for full OpenType support and more Arabic glyphs (Mac OS X feedback) and the makers of the fonts for AAT instructions in addition to their OpenType ones. I'm not sure how effective it will be, particularly regarding the fontmakers, but otherwise we're stuck with Apple's five fonts
Hope this was helpful and not too confusing!
Smokey
Thank you so much. I have been searching and searching for info, reading posts like this, then realized no one had my questions. Your answer helps more than anything I've read the last months in helping me understand.
I shall start the lobbying!
One last question: can we not re-encode these ttf fonts using VOLT, or pfaedit, or even fontographer (i'd mention fontlab but I can't afford that baby).? Where can I got to discuss these questions?
Thank you so much. I have been searching and searching for info, reading posts like this, then realized no one had my questions. Your answer helps more than anything I've read the last months in helping me understand.
Glad my vast store of arcane and "useless" knowledge has been helpful!
Shalkar wrote:
One last question: can we not re-encode these ttf fonts using VOLT, or pfaedit, or even fontographer (i'd mention fontlab but I can't afford that baby).? Where can I got to discuss these questions?
In theory, yes. In practice, it might be more difficult than it seems. At the end of the thread in Apple Discussions I mentioned in my first post, the conclusion seems to be pfaedit (now FontForge) currently won't install on the Mac and it's pretty difficult to add the needed AAT tables (so the newly re-encoded fonts will have their letters joined by the OS routines) Fontographer apparently only runs in OS 9/Classic on the Mac, which means it hasn't been updated in a while, so while it can probably re-encode the fonts to Unicode, it might not be able to do AAT tables either.... Creating/re-encoding fonts is unfortunately beyond my realm of knowledge, so who knows for sure. It seems that FontForge does have several mailing lists, so that's one place to go http://fontforge.sourceforge.net/#Mail.
Shalkar wrote:
Again, thank you.
My pleasure. Please keep me posted if you find any solutions
Posted: Sun Nov 07, 2004 2:56 pm Post subject: Re: Arabic Fonts
sardisson wrote:
In theory, yes. In practice, it might be more difficult than it seems. At the end of the thread in Apple Discussions I mentioned in my first post, the conclusion seems to be pfaedit (now FontForge) currently won't install on the Mac and it's pretty difficult to add the needed AAT tables (so the newly re-encoded fonts will have their letters joined by the OS routines) Fontographer apparently only runs in OS 9/Classic on the Mac, which means it hasn't been updated in a while, so while it can probably re-encode the fonts to Unicode, it might not be able to do AAT tables either.... Creating/re-encoding fonts is unfortunately beyond my realm of knowledge, so who knows for sure. It seems that FontForge does have several mailing lists, so that's one place to go http://fontforge.sourceforge.net/#Mail.
Smokey
Although I just skimmed over the discussions here, and I am NOT a fonts guru, I thought it worth mentioning free font tools that Apple supplies - maybe something in there will be of use.
I want Arabic translation of NeoOffice please. OpenOffice already exists in Arabic so please inlude it in NeoOffice.
As I mentioned in the Hebrew language, it doesn't seem like the Arabic and Hebrew translations/localizations exist in the official OOo source that is used to build NeoOffice/J. I will pm M-Rick, who planned to try and merge the unofficial Arabic OOo Linux files with NeoJ, and see if he can report how he did it if he was successful.
What really needs to be done, though, is for the people who did the Arabic translation to "donate" it to OOo and get it checked in to the OOo CVS so it will be available to everyone....
Joined: May 25, 2003 Posts: 4752 Location: Santa Barbara, CA
Posted: Thu Nov 11, 2004 9:22 pm Post subject:
Also though, don't think that translations need to be within OpenOffice.org itself to be incorporated.
OpenOffice.org has its own restrictions as to what can and what can't be incorporated into mainline OOo, particularly licensing restrictions and copyright assignment limitations. NeoOffice, OTOH, is full GPL. Many of the dictionaries and a couple localizations are GPL licensed only and aren't able to be incorporated into OOo, but we can incorporate them without problem into Neo/J.
So if there's even a GPL licensed one we can use it. All of the Mac OS X hebrew work I know of, at least, is not OOo mainline and I've not been able to ever find any source code or the like. I'm not sure about the 'unofficial' Arabic thread mentioned, but if it's GPL we can put it in.
There is Hebrew and Arabic localizations in the OOo 1.1.2 codebase. However, when I was about to release Neo/J 1.1 Alpha 1, I found that all of the new localizations made the Neo/J download almost 200 MB!
Since I am always bending and twisting the Neo/J binaries to fit within the various disk and bandwidth limitations on my download sites, I trimmed out many of the localizations that were added in OOo 1.1.2 that were not in OOo 1.0.3.
In the ideal world, I could release downloads that have all localizations and help files in all OOo supported languages. However, this takes a astronomical amount of disk and bandwidth.
Instead, I have to figure out how to build "language packs" or build separate binaries for each language group (e.g. "Neo/J Asia edition", "Neo/J Europe, North American, and South American edition", etc.). The problem is that this will require an order of magnitude increase in both my time spent doing releases as well as the same increase in bandwidth usage.
If anyone has any simpler ideas, they would be very much appreciated.
Joined: May 25, 2003 Posts: 4752 Location: Santa Barbara, CA
Posted: Thu Nov 11, 2004 10:08 pm Post subject:
Ah, OK, apologies patrick. I was just pulling the licensing rabbit out of the hat when in fact we were dealing with an aardvark...
This was exactly the same issue I had when making the OOo X11 1.0.x "localizer"...including *everything* for each language was just too durned large. I eventually axed the help and managed to get things down to a reasonable size.
One of the great strengths of what Patrick's done with Neo/J is make it "self-localizing" in that it can translate itself without running lots of extra installers. This makes it really a world-class Mac OS X citizen (no pun intended) with one installer whereas traditional OpenOffice.org has one installer per language (!). Needless to say, mirror sites couldn't host 20+ 100 MB images for each language. OOo localization is, as a result, an utter mess.
Previously my mind was going down the line of making a single "english only" installer (still can edit foreign languages, just english interface and help) and then a second "any language" installer that would cover every foreign language we can think of. While the "any language" installer would be probably an order of magnitude larger, it would be the kind of thing that people who really need localization could download and the kind of thing that could be the main installer for CD/DVD distros. I'm not sure if that line of thinking is amenable, though.
I've generally found in the last few years that folks who want a foreign language and only a foreign language are more then willing to support a local build themselves or pay for a third-party to do the bundling and install directions and support in their own language.
For better or for worse, the main languages of Neo and OOo are English and German due to the size of the user and developer communities in those languages respectively.
Wow, I didn't realize they were in the codebase already. Mea maxima culpa. (At least partly. Every clue I found on the OOo website indicated that there were partial translations in progress, only available in "third-party" builds, and I gave up trying to find them in "ViewCVS" which apparently is OOo's way of browsing CVS via the web....)
It sounds like we need a "language summit" now, too, to help come up with better solutions to this problem.
A few preliminary questions and then I'll start a new thread (or Ed or Patrick could split the last few posts off from the main body of this (fonts-related) "Arabic" thread):
1. What UI languages does NeoJ currently ship with?
2. NeoJ It only ships with English help, right?
3. And all the dictionaries are included in the NeoJ package but only the one that matches system language at first NeoJ launch is enabled because the OOo code sucks up too many CPU cycles or something with all the dictionaries enabled.
All times are GMT - 7 Hours Goto page 1, 2, 3, 4Next
Page 1 of 4
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum