View previous topic :: View next topic |
Author |
Message |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Wed Mar 09, 2005 11:43 am Post subject: xml file formats and search by content |
|
Since yesterday I was trying to get search by content to find particular files containing particular words. I noticed that it just would not find some particular files. Some other files would be found, though. I got desparate and searched all possible topics on the net. But then I copied the text that finder could not find into a MS word document and saved it. Finder found it immediately. The same text saved by NeoOffice coul not be found if searched by content. Now I stumbled across the help in NeoOffice which talks about xml file formats and says that it actually uses some kind of compression like that of zip files. So I thought - maybe the content in this format is so deeply concealed that finder just cannot get to it and terefore does not find the file? If that is so - is that a bug or it is a feature of OpenOffice? |
|
Back to top |
|
 |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Wed Mar 09, 2005 12:01 pm Post subject: |
|
Now I experimented a bit more with this and found out that the file can be found if I enter the search words in the properties of the file. when I remove them and reset the properties, finder again cannot find the file.
It seems also that file saved by NeoOffice in .doc format becomes searchable to Finder |
|
Back to top |
|
 |
pluby The Architect


Joined: Jun 16, 2003 Posts: 11949
|
Posted: Wed Mar 09, 2005 12:29 pm Post subject: |
|
The reason that Finder won't search the contents of Neo/J files is because Finder does not support searching of the OOo file format. Apple did put in searching of MS Office formats, but Apple has not bothered to make any of their tools handle the OOo file format.
Patrick |
|
Back to top |
|
 |
ovvldc Captain Naiobi

Joined: Sep 13, 2004 Posts: 2352 Location: Zürich, CH
|
Posted: Wed Mar 09, 2005 2:47 pm Post subject: |
|
pluby wrote: | The reason that Finder won't search the contents of Neo/J files is because Finder does not support searching of the OOo file format. Apple did put in searching of MS Office formats, but Apple has not bothered to make any of their tools handle the OOo file format. |
Is Spotlight going to worked on compressed files? If it is, chances are good they can be searched without problem. Good thing about OOo is that it is (somewhat) human readable text inside a zip, right? Ingredients should be about there..
Oscar _________________ "What do you think of Western Civilization?"
"I think it would be a good idea!"
- Mohandas Karamchand Gandhi |
|
Back to top |
|
 |
pluby The Architect


Joined: Jun 16, 2003 Posts: 11949
|
Posted: Wed Mar 09, 2005 2:53 pm Post subject: |
|
Yes, the OOo files are merely plain text XML files zipped up. In fact, you can manually unzip them using the following terminal command:
jar xvf <OOo file>
Patrick |
|
Back to top |
|
 |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Wed Mar 09, 2005 4:05 pm Post subject: |
|
pluby wrote: | Yes, the OOo files are merely plain text XML files zipped up. In fact, you can manually unzip them using the following terminal command:
jar xvf <OOo file> |
Can doing that somehow help me solve the search by content problem? I tend to have thousands of documents with texts and sometimes I have to find where a particular word is mentioned. If text by content is unavailable, it is a big drawback for me. |
|
Back to top |
|
 |
sardisson Town Crier


Joined: Feb 01, 2004 Posts: 4588
|
Posted: Wed Mar 09, 2005 4:53 pm Post subject: |
|
Some time ago someone mentioned trying to get support for search OOo files in certain search applications (other than the Finder). The end result was that the author of at least one of the programs agreed to look in to supporting OOo files. If you do a search here, you'll probably find the thread and the names of the apps.
Ed has at least given some thought to writing a plugin for the forthcoming 10.4 Spotlight engine and the OpenDocument (OOo 2.0) format...whether he'll have the time to do so only he can answer.
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
 |
fabrizio venerandi Guest

|
Posted: Thu Mar 10, 2005 12:25 am Post subject: |
|
if you have not problem with hd space or formatting, you can use the old staroffice format file. I uses staroffice format for big document for example, 'cause opening calc file in neooffice format is too slow.
And osx search content look into staroffice files.
f. |
|
Back to top |
|
 |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Thu Mar 10, 2005 9:13 am Post subject: |
|
Thank you all for useful advices. Eventually I managed to find some opensource program called docsearch which can do the search of NeoOffice and other kinds. The interface is not very friendly to the eye but at least it works. |
|
Back to top |
|
 |
sardisson Town Crier


Joined: Feb 01, 2004 Posts: 4588
|
Posted: Fri Mar 11, 2005 11:31 pm Post subject: |
|
I can't find the old posts I was looking for; I think they probably got lost when Ed restored trinity after he came back from vacation one month. There's one thread where SOB noted he had requested the authors of EasyFind and SpeedSearch to add OOo/Neo doc support. But I remember a post where someone reported that some search author had agreed to add support to the next version of his product, and I can't find it now.
Here are some other relevant OOo links, just to have them in one place:
http://www.danielnaber.de/loook/
http://oootools.free.fr/fooox/
http://www.openoffice.org/issues/show_bug.cgi?id=14468
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
 |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Sat Mar 12, 2005 10:50 am Post subject: |
|
I had some interesting discussion on this topic at the Apple Panther support page and here are some work-arounds that I found:
1) docSearcher that I mentioned (requires indexing and gives back search results that to me were not very comprehensible but others might think different) docSearch finds all documents in all formats that contain particular thread. the download page is here http://www.brownsite.net/docsearch.htm
2) someone posted a terminal search code (if that's how it is called) ( http://discussions.info.apple.com/webx?13@116.yy3YaI37RO4.927450@.68a8d1b9/6 ) which works quite fast, uses no indexing and gives a simple list of files containing the thread. But it looks only for xml file formats so all the other files have to be searched separately. that someone also said that it might be quite easy to write similar code for Apple Script and make it run from NeoOffice or finder or whatever - maybe there are poeple who want to take this challenge and write a script? I have no knowledge in the Script and thus cannot do that myself
3) at this link http://www.darwinwars.com/lunatic/bugs/oo_macros.html someone posted a set of macros among which there is supposed to be one that does the search inside Openoffice. Other macros worked fine for me but the search made NeoOffice quit. I don't know if that was my mistake or the fault of the script. If someone else tries that and makes it work - let me know.
4) Easyfind did not work for me when I wanted to find the NeoOffice documents, which might mean that the promises from them to look at this problem remained just promises  |
|
Back to top |
|
 |
sardisson Town Crier


Joined: Feb 01, 2004 Posts: 4588
|
Posted: Sun Mar 13, 2005 10:52 am Post subject: |
|
bezvardis wrote: | 2) someone posted a terminal search code (if that's how it is called) ( http://discussions.info.apple.com/webx?13@116.yy3YaI37RO4.927450@.68a8d1b9/6 ) which works quite fast, uses no indexing and gives a simple list of files containing the thread. But it looks only for xml file formats so all the other files have to be searched separately. that someone also said that it might be quite easy to write similar code for Apple Script and make it run from NeoOffice or finder or whatever - maybe there are poeple who want to take this challenge and write a script? I have no knowledge in the Script and thus cannot do that myself |
There are a couple of Neo/J folks that could probably convert that into a nice AppleScript (Max, yoxi, me, that I know of). I unfortunately don't have time to look into it right now but if no one else has taken it on after next week, I should have some time then.
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki |
|
Back to top |
|
 |
Max_Barel Oracle

Joined: May 31, 2003 Posts: 219 Location: French Alps
|
Posted: Sun Mar 13, 2005 5:32 pm Post subject: |
|
Smokey wrote: | I unfortunately don't have time to look into it right now but if no one else has taken it on after next week, I should have some time then. |
Same here. To avoid wasting concurrent effort we should synchronize by posting in this thread before starting up. |
|
Back to top |
|
 |
OPENSTEP The One


Joined: May 25, 2003 Posts: 4752 Location: Santa Barbara, CA
|
Posted: Mon Mar 14, 2005 9:53 pm Post subject: |
|
FWIW I've had the idea to do a Spotlight filter for OOo docs for some time but have not had the time to do it. Most of my relevant research is in the NSFilter proposal on dashboardbuddha, and the bulk of the concept would apply for an OSS Spotlight plugin framework. For better or for worse, I tend to put bug fixing (or narrowing down bugs) on a higher priority level then Spotlight plugins as, well, the OS isn't even fully released to the public yet
ed |
|
Back to top |
|
 |
bezvardis Keymaker

Joined: Dec 10, 2004 Posts: 89 Location: Latvia
|
Posted: Tue Mar 15, 2005 9:49 am Post subject: |
|
Gib Henry has posted a script (written by biovizer) that resulted from discussion on this topic on one of Apple discussion boards http://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&t=1186
The script though has some limitations (e.g, one cannot quit it easily when it runs, it only searches for one word or a phrase but not keywords, it also searches only Openoffice files and no other - things like that). Otherwise it works pretty well and quite fast and looks nice. If anyone would like to improve it - that would be even more fantastic than the fact that biovizer wrote this piece  |
|
Back to top |
|
 |
|