Welcome to NeoOffice developer notes and announcements
NeoOffice
Developer notes and announcements
 
 

This website is an archive and is no longer active
NeoOffice announcements have moved to the NeoOffice News website


Support
· Forums
· NeoOffice Support
· NeoWiki


Announcements
· Twitter @NeoOffice


Downloads
· Download NeoOffice


  
NeoOffice :: View topic - Adding lists of terms to OSX spell-checker
Adding lists of terms to OSX spell-checker
 
   NeoOffice Forum Index -> NeoOffice Releases
View previous topic :: View next topic  
Author Message
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Tue Mar 17, 2009 11:56 am    Post subject: Adding lists of terms to OSX spell-checker

Now that NeoOffice uses (in many cases) the OSX spell-checker, I thought you might find useful a perl script I came across that will take a text file of one-word-per-line as input and turn it into a format that you can paste straight into your ~/Library/Spelling/<language code> file.

After a restart, all those new terms will be recognised by the spell-checker alongside the ones you've already added.

Code:
#!/usr/bin/perl -w

# This script reads a list of strings (one per line) from STDIN
# or from the files supplied as command-line arguments
# and outputs those strings to STDOUT separated by zeros.
# Cameron Hayne (macdev@hayne.net) June 2005

# cl format is ./dictify input.file > output.file where input.file has one word per line
# paste contents of output.file into ~/Library/Spelling/en_GB - TextWrangler etc. show the invisibles

my $zerobyte = pack("B8", 0);
while (<>)
{
chomp();
print "$_$zerobyte";
}

I called it dictify and added the last 2 rem lines to clarify how to use it with word list files.

Save it with whatever name you like, and make it executable.

Hope this is useful to someone, it certainly has been to me - I'm migrating a friend from a dell to a macbook, and he's a bit dyslexic - we have something like 2500 Buddhist names and terms in a text file that we can import into his new spell checker!

- padmavyuha
Back to top
James3359
The Merovingian


Joined: Jul 05, 2005
Posts: 685
Location: North West England

PostPosted: Tue Mar 17, 2009 1:08 pm    Post subject:

Smokey has recently been updating the Wiki pages on this subject. I wonder if it would be worth adding this, or a link to this, on the Exporting Word Lists page.
Back to top
sardisson
Town Crier
Town Crier


Joined: Feb 01, 2004
Posts: 4588

PostPosted: Tue Mar 17, 2009 3:57 pm    Post subject:

James3359 wrote:
Smokey has recently been updating the Wiki pages on this subject. I wonder if it would be worth adding this, or a link to this, on the Exporting Word Lists page.


Yeah, let's add the perl script to that page. (It would be awesome if the script could also be made to parse standard.dic directly, so that it could act on both one-word-per-line wordlists and on someone's exported standard.dic; one less step for migrations.)

Smokey

_________________
"[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Tue Mar 17, 2009 11:52 pm    Post subject:

Beyond me, I'm afraid - just the messenger Smile
Back to top
Markk
Operator


Joined: Mar 15, 2007
Posts: 43
Location: Wisconsin US

PostPosted: Wed Mar 18, 2009 6:14 pm    Post subject:

I can do it no problem except I have no words in standard.dic. Do any of you have a file that would be good, reply with it and I'll add in the code. It will only be a couple of lines, I think. Depending on what I see in the file.
Back to top
James3359
The Merovingian


Joined: Jul 05, 2005
Posts: 685
Location: North West England

PostPosted: Thu Mar 19, 2009 3:35 am    Post subject:

Here's my standard.dic file. There are not many words in it, so I hope it will do.
Back to top
Markk
Operator


Joined: Mar 15, 2007
Posts: 43
Location: Wisconsin US

PostPosted: Thu Mar 19, 2009 8:52 am    Post subject: Working on it.

James3359 wrote:
Here's my standard.dic file. There are not many words in it, so I hope it will do.


A little nastier than I thought in format but look back tomorrow.
Mark

Update - The standard.dic file seems to be in 16 bit Unicode? I hope so. This will be quite add hoc. Looking at a couple of standard.dic they seem not to be consistent, so there is probably a rule I don't understand. I think I might just make it output a single word per line list so it could be run into the other program. Will be gone over the weekend though.
Back to top
Markk
Operator


Joined: Mar 15, 2007
Posts: 43
Location: Wisconsin US

PostPosted: Mon Mar 23, 2009 9:45 am    Post subject: Code for standard.dic and others

Ok Here is code that takes words on individual lines, words separated by whitespace or words in standard.dic for and outputs a list separated by nulls that can be added to ~Library/Spelling wordlists.

Code:

#/usr/bin/perl -w

use strict;

# This script 'split_to_dict'
# 1. Reads standard input or a list of files specified on the command line
#    line by line in text mode, so it will automatically account for
#    unicode double byte where it (and perl) can.
# 2. It splits the lines into strings based on whitespace or
#    null (zero) characters
# 3. It removes all control characters from the strings and
# 4. Outputs the strings to STDOUT separated by zero (null) characters.
#
# Usage: perl split_to_dict inputfile1 inputfile2 > targetfile
#
# The inputfiles could be standard.dic OSX or Open Office dict or a list
# of words one per line or whitespace separated.
#
# The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries:
# cat targetfile >> ~/Library/Spelling/targetDictionary
#
# should do it where targetDictionary is "en" or "GB_en" or whatever.
# based on ideas from Cameron Hayne (macdev@hayne.net) June 2005
# version 1 Mark Kaehny March 2009
#
# released under the same license as the standard perl distribution:
# GPL version 2 or later (See the Free Software Foundation Websitei) or
# Artistic license version 2.
#

my $line;
my $word;

while ($line = <>) {
    # split on whitespace or NULL (0 valued) character
    foreach $word (split(/[\s\x00]/, $line)) {
        next if $word =~ /WBSWG6/; # skip standard.dic header
                                   # add manually if needed.
        $word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii)
        print $word, "\x00" if ($word); # add null & skip blank words
    }
}



Copy this and save it somewhere as split_to_dict and use as directed. I tested it with the given standard.dic and it worked for me. I am using early access 3. though. I did need to actually restart to get the words to be active. Clerestory and Colourant. Hmm... have to use those words somewhere.
Back to top
ovvldc
Captain Naiobi


Joined: Sep 13, 2004
Posts: 2352
Location: Zürich, CH

PostPosted: Mon Mar 23, 2009 10:52 am    Post subject:

I was thinking about making a little droplet app wth Platypus, but then I cannot enter a destination filename..

Still, it will be very useful until Patrick gets the spellchecker learning API running at some point.

best wishes,
Oscar

_________________
"What do you think of Western Civilization?"
"I think it would be a good idea!"
- Mohandas Karamchand Gandhi
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Mon Mar 23, 2009 10:52 am    Post subject:

Smart! Thanks a lot for doing this...

- padmavyuha
Back to top
pluby
The Architect
The Architect


Joined: Jun 16, 2003
Posts: 11949

PostPosted: Mon Mar 23, 2009 11:05 am    Post subject:

ovvldc wrote:
Still, it will be very useful until Patrick gets the spellchecker learning API running at some point.


That is the problem: there is no public API in Mac OS X's spellchecker service to register a learned word. In other words, Apple has spliced this functionality into their native spellchecking dialog using private functions.

Below is the public API. Note that Apple only allows applications to unlearn existing learned words:

http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpellChecker_Class/Reference/Reference.html

Patrick
Back to top
sardisson
Town Crier
Town Crier


Joined: Feb 01, 2004
Posts: 4588

PostPosted: Mon Mar 23, 2009 12:07 pm    Post subject:

Huh, I had thought that learnWord: was made public on 10.5, but I guess I misremembered. Sad

The spelling API is leaps and bounds better in 10.5, but unfortunately that's not saying much, since the API was essentially useless to non-Spelling-panel applications before.

Smokey

_________________
"[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Back to top
sardisson
Town Crier
Town Crier


Joined: Feb 01, 2004
Posts: 4588

PostPosted: Mon Mar 23, 2009 12:09 pm    Post subject: Re: Code for standard.dic and others

Markk wrote:
Ok Here is code that takes words on individual lines, words separated by whitespace or words in standard.dic for and outputs a list separated by nulls that can be added to ~Library/Spelling wordlists.

Thanks so much for doing that Smile

I'll get it into the wiki later today if no one else has already done so.

Smokey

_________________
"[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Back to top
sardisson
Town Crier
Town Crier


Joined: Feb 01, 2004
Posts: 4588

PostPosted: Mon Mar 23, 2009 3:38 pm    Post subject:

I've updated the article with Markk's new script.

Questions:

1) should we put the "Importing Words into the Mac OS X User Dictionary" section first on the page and move "“Exporting” the User Dictionary" to the end?

2) Should we remove the original hayne script yoxi found? What about the "Completely Manual Method" or the links to other dictionary-writing apps?

Smokey

_________________
"[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Mon Mar 23, 2009 3:49 pm    Post subject:

2) might as well, it's been well superseded (the perl script, that is).
Back to top
Display posts from previous:   
   NeoOffice Forum Index -> NeoOffice Releases All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum

Powered by phpBB © 2001, 2005 phpBB Group

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © Planamesa Inc.
NeoOffice is a registered trademark of Planamesa Inc. and may not be used without permission.
PHP-Nuke Copyright © 2005 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.