View previous topic :: View next topic |
Author |
Message |
yoxi Cipher
Joined: Sep 07, 2004 Posts: 1799 Location: Dawlish, Devon
|
Posted: Tue Mar 17, 2009 11:56 am Post subject: Adding lists of terms to OSX spell-checker |
|
Now that NeoOffice uses (in many cases) the OSX spell-checker, I thought you might find useful a perl script I came across that will take a text file of one-word-per-line as input and turn it into a format that you can paste straight into your ~/Library/Spelling/<language code> file.
After a restart, all those new terms will be recognised by the spell-checker alongside the ones you've already added.
Code: | #!/usr/bin/perl -w
# This script reads a list of strings (one per line) from STDIN
# or from the files supplied as command-line arguments
# and outputs those strings to STDOUT separated by zeros.
# Cameron Hayne (macdev@hayne.net) June 2005
# cl format is ./dictify input.file > output.file where input.file has one word per line
# paste contents of output.file into ~/Library/Spelling/en_GB - TextWrangler etc. show the invisibles
my $zerobyte = pack("B8", 0);
while (<>)
{
chomp();
print "$_$zerobyte";
} |
I called it dictify and added the last 2 rem lines to clarify how to use it with word list files.
Save it with whatever name you like, and make it executable.
Hope this is useful to someone, it certainly has been to me - I'm migrating a friend from a dell to a macbook, and he's a bit dyslexic - we have something like 2500 Buddhist names and terms in a text file that we can import into his new spell checker!
- padmavyuha |
|
Back to top |
|
|
James3359 The Merovingian
Joined: Jul 05, 2005 Posts: 685 Location: North West England
|
Posted: Tue Mar 17, 2009 1:08 pm Post subject: |
|
Smokey has recently been updating the Wiki pages on this subject. I wonder if it would be worth adding this, or a link to this, on the Exporting Word Lists page. |
|
Back to top |
|
|
sardisson Town Crier
Joined: Feb 01, 2004 Posts: 4588
|
Posted: Tue Mar 17, 2009 3:57 pm Post subject: |
|
James3359 wrote: | Smokey has recently been updating the Wiki pages on this subject. I wonder if it would be worth adding this, or a link to this, on the Exporting Word Lists page. |
Yeah, let's add the perl script to that page. (It would be awesome if the script could also be made to parse standard.dic directly, so that it could act on both one-word-per-line wordlists and on someone's exported standard.dic; one less step for migrations.)
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
|
yoxi Cipher
Joined: Sep 07, 2004 Posts: 1799 Location: Dawlish, Devon
|
Posted: Tue Mar 17, 2009 11:52 pm Post subject: |
|
Beyond me, I'm afraid - just the messenger |
|
Back to top |
|
|
Markk Operator
Joined: Mar 15, 2007 Posts: 43 Location: Wisconsin US
|
Posted: Wed Mar 18, 2009 6:14 pm Post subject: |
|
I can do it no problem except I have no words in standard.dic. Do any of you have a file that would be good, reply with it and I'll add in the code. It will only be a couple of lines, I think. Depending on what I see in the file. |
|
Back to top |
|
|
James3359 The Merovingian
Joined: Jul 05, 2005 Posts: 685 Location: North West England
|
Posted: Thu Mar 19, 2009 3:35 am Post subject: |
|
Here's my standard.dic file. There are not many words in it, so I hope it will do. |
|
Back to top |
|
|
Markk Operator
Joined: Mar 15, 2007 Posts: 43 Location: Wisconsin US
|
Posted: Thu Mar 19, 2009 8:52 am Post subject: Working on it. |
|
James3359 wrote: | Here's my standard.dic file. There are not many words in it, so I hope it will do. |
A little nastier than I thought in format but look back tomorrow.
Mark
Update - The standard.dic file seems to be in 16 bit Unicode? I hope so. This will be quite add hoc. Looking at a couple of standard.dic they seem not to be consistent, so there is probably a rule I don't understand. I think I might just make it output a single word per line list so it could be run into the other program. Will be gone over the weekend though. |
|
Back to top |
|
|
Markk Operator
Joined: Mar 15, 2007 Posts: 43 Location: Wisconsin US
|
Posted: Mon Mar 23, 2009 9:45 am Post subject: Code for standard.dic and others |
|
Ok Here is code that takes words on individual lines, words separated by whitespace or words in standard.dic for and outputs a list separated by nulls that can be added to ~Library/Spelling wordlists.
Code: |
#/usr/bin/perl -w
use strict;
# This script 'split_to_dict'
# 1. Reads standard input or a list of files specified on the command line
# line by line in text mode, so it will automatically account for
# unicode double byte where it (and perl) can.
# 2. It splits the lines into strings based on whitespace or
# null (zero) characters
# 3. It removes all control characters from the strings and
# 4. Outputs the strings to STDOUT separated by zero (null) characters.
#
# Usage: perl split_to_dict inputfile1 inputfile2 > targetfile
#
# The inputfiles could be standard.dic OSX or Open Office dict or a list
# of words one per line or whitespace separated.
#
# The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries:
# cat targetfile >> ~/Library/Spelling/targetDictionary
#
# should do it where targetDictionary is "en" or "GB_en" or whatever.
# based on ideas from Cameron Hayne (macdev@hayne.net) June 2005
# version 1 Mark Kaehny March 2009
#
# released under the same license as the standard perl distribution:
# GPL version 2 or later (See the Free Software Foundation Websitei) or
# Artistic license version 2.
#
my $line;
my $word;
while ($line = <>) {
# split on whitespace or NULL (0 valued) character
foreach $word (split(/[\s\x00]/, $line)) {
next if $word =~ /WBSWG6/; # skip standard.dic header
# add manually if needed.
$word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii)
print $word, "\x00" if ($word); # add null & skip blank words
}
}
|
Copy this and save it somewhere as split_to_dict and use as directed. I tested it with the given standard.dic and it worked for me. I am using early access 3. though. I did need to actually restart to get the words to be active. Clerestory and Colourant. Hmm... have to use those words somewhere. |
|
Back to top |
|
|
ovvldc Captain Naiobi
Joined: Sep 13, 2004 Posts: 2352 Location: Zürich, CH
|
Posted: Mon Mar 23, 2009 10:52 am Post subject: |
|
I was thinking about making a little droplet app wth Platypus, but then I cannot enter a destination filename..
Still, it will be very useful until Patrick gets the spellchecker learning API running at some point.
best wishes,
Oscar _________________ "What do you think of Western Civilization?"
"I think it would be a good idea!"
- Mohandas Karamchand Gandhi |
|
Back to top |
|
|
yoxi Cipher
Joined: Sep 07, 2004 Posts: 1799 Location: Dawlish, Devon
|
Posted: Mon Mar 23, 2009 10:52 am Post subject: |
|
Smart! Thanks a lot for doing this...
- padmavyuha |
|
Back to top |
|
|
pluby The Architect
Joined: Jun 16, 2003 Posts: 11949
|
|
Back to top |
|
|
sardisson Town Crier
Joined: Feb 01, 2004 Posts: 4588
|
Posted: Mon Mar 23, 2009 12:07 pm Post subject: |
|
Huh, I had thought that learnWord: was made public on 10.5, but I guess I misremembered.
The spelling API is leaps and bounds better in 10.5, but unfortunately that's not saying much, since the API was essentially useless to non-Spelling-panel applications before.
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
|
sardisson Town Crier
Joined: Feb 01, 2004 Posts: 4588
|
Posted: Mon Mar 23, 2009 12:09 pm Post subject: Re: Code for standard.dic and others |
|
Markk wrote: | Ok Here is code that takes words on individual lines, words separated by whitespace or words in standard.dic for and outputs a list separated by nulls that can be added to ~Library/Spelling wordlists. |
Thanks so much for doing that
I'll get it into the wiki later today if no one else has already done so.
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
|
sardisson Town Crier
Joined: Feb 01, 2004 Posts: 4588
|
Posted: Mon Mar 23, 2009 3:38 pm Post subject: |
|
I've updated the article with Markk's new script.
Questions:
1) should we put the "Importing Words into the Mac OS X User Dictionary" section first on the page and move "“Exporting†the User Dictionary" to the end?
2) Should we remove the original hayne script yoxi found? What about the "Completely Manual Method" or the links to other dictionary-writing apps?
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
|
yoxi Cipher
Joined: Sep 07, 2004 Posts: 1799 Location: Dawlish, Devon
|
Posted: Mon Mar 23, 2009 3:49 pm Post subject: |
|
2) might as well, it's been well superseded (the perl script, that is). |
|
Back to top |
|
|
|