Welcome to NeoOffice developer notes and announcements
NeoOffice
Developer notes and announcements
 
 

This website is an archive and is no longer active
NeoOffice announcements have moved to the NeoOffice News website


Support
· Forums
· NeoOffice Support
· NeoWiki


Announcements
· Twitter @NeoOffice


Downloads
· Download NeoOffice


  
NeoOffice :: View topic - Opening PDF Files
Opening PDF Files
 
   NeoOffice Forum Index -> NeoOffice Releases
View previous topic :: View next topic  
Author Message
aussie149
The Merovingian


Joined: Feb 12, 2005
Posts: 607
Location: Australia

PostPosted: Wed May 03, 2006 8:16 am    Post subject: Opening PDF Files

I'd love to have an Office suite which opened pdf files, or imported them. I've tried saving as postrcript and all I get when I re-open them in NeoOffice 2.0 is many pages of code stuff. I tried it tonight with an article from the London Indepent titled "Is Pope poised to sanction condoms?" [catchy headline Wink]. Neo2 doesn't open the pdf file, but with the postscript file it opens, but just lots of pages of garbage and so forth. I did a search and coudn't find "pope" or "condom" anywhere, although my own name was there. Is that a sign, do you think? Smile

Is there any way that NeoOffice or Abiword or Scribus could open pdf files?

I can do it in X11 with KWord, which is reasonably effective at it.

P
Back to top
pluby
The Architect
The Architect


Joined: Jun 16, 2003
Posts: 11949

PostPosted: Wed May 03, 2006 8:27 am    Post subject: Re: Opening PDF Files

aussie149 wrote:
Is there any way that NeoOffice or Abiword or Scribus could open pdf files?


Nope.

Patrick
Back to top
aussie149
The Merovingian


Joined: Feb 12, 2005
Posts: 607
Location: Australia

PostPosted: Wed May 03, 2006 8:41 am    Post subject: Re: Opening PDF Files

pluby wrote:
aussie149 wrote:
Is there any way that NeoOffice or Abiword or Scribus could open pdf files?


Nope.

Patrick


OoooK, that's the end of that wish. Thanks Patrick. I won't waste any wishes on that one then Sad

My latest workaround is to save as pdf, use Preview to save as tiff and then get my ocr program to scan the tiff file and ocr it. Then save as text, or select all copy and paste to NeoOffice. Hard work and unreliable, but what we have to do it seems.
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Wed May 03, 2006 10:06 am    Post subject: Re: Opening PDF Files

aussie149 wrote:
My latest workaround is to save as pdf, use Preview to save as tiff and then get my ocr program to scan the tiff file and ocr it. Then save as text, or select all copy and paste to NeoOffice. Hard work and unreliable, but what we have to do it seems.

Seems a bit roundabout if you just want to extract the text from a pdf file: open it in Adobe reader v7, choose Select All from the Edit menu, and cmd-C to copy the text (does it 1 page at a time, mind you, so a bit tiresome for multipage docs, but at least you don't have to involve ocr etc.)

- padmavyuha

*edit* Actually, Adobe Reader will select *all* text in the doc if you're in continuous view mode rather than page view mode (view-Page Layout-Continuous) - cool huh?
Back to top
OPENSTEP
The One
The One


Joined: May 25, 2003
Posts: 4752
Location: Santa Barbara, CA

PostPosted: Wed May 03, 2006 10:18 am    Post subject:

I know I've also directly selected text in Preview (on 10.4) to export it via the clipboard as well. It doesn't work with all PDFs, particularly those that are locked...I still don't understand why folks lock PDFs. Seems a bit silly when I can just take a screenshot of the PDF and print it out or, as mentioned above, OCR it.

ed
Back to top
aussie149
The Merovingian


Joined: Feb 12, 2005
Posts: 607
Location: Australia

PostPosted: Wed May 03, 2006 11:02 am    Post subject:

OPENSTEP wrote:
I know I've also directly selected text in Preview (on 10.4) to export it via the clipboard as well. It doesn't work with all PDFs, particularly those that are locked...I still don't understand why folks lock PDFs. Seems a bit silly when I can just take a screenshot of the PDF and print it out or, as mentioned above, OCR it.

ed


Actually that does work. I can in some cases do a print>save as pdf, then open in Preview, select all and copy. There you go! Protected docs don't work this way. I've tried saving as postrscript, which is all they seem to let you do. That seems to work, but Preview won't then open its own postscript creations. Off topic, I realise.
Back to top
LemonAid
The Anomaly


Joined: Nov 21, 2005
Posts: 1285
Location: Witless Protection Program

PostPosted: Wed May 03, 2006 2:54 pm    Post subject: Re: Opening PDF Files

yoxi wrote:

Seems a bit roundabout if you just want to extract the text from a pdf file: open it in Adobe reader v7, choose Select All from the Edit menu, and cmd-C to copy the text <snip>
*edit* Actually, Adobe Reader will select *all* text in the doc if you're in continuous view mode rather than page view mode (view-Page Layout-Continuous) - cool huh?

- padmavyuha


This exactly how we do it at work using the full Adobe Acrobat Professional package. It did not use to be so easy to copy text out so you should be happy.
Also remember to do a "Paste Special" as the Copy will have some Rich Text Format (RTF) information. Things like Fonts, Size, Bold and such that apply to text items. You do lose Page and Document formatting items.

Remember, .PDF is an OUTput format, like sending to a printer. It's meant to fully capture the visual look of an document, NOT for archiving and/or editing.

Philip ( .PDF for output, OpenDocument Format for Archival/editing Wink )
Back to top
steveread
Guest





PostPosted: Wed May 03, 2006 8:05 pm    Post subject: opening pdf files

Actually, v7 of Adobe Reader has a Save as text option that allows you to save the text contents of the file into a text file.
Also, Acrobat has a save as rtf option, I believe.
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Wed May 03, 2006 10:36 pm    Post subject: Re: Opening PDF Files

LemonAid wrote:
Also remember to do a "Paste Special" as the Copy will have some Rich Text Format (RTF) information. Things like Fonts, Size, Bold and such that apply to text items.

Strangely, using Paste Special after copying from ARv7 to paste into NeoOffice v2 alpha does NOT preserve font data, only point size when I try it.

- Padmavyuha
Back to top
sardisson
Town Crier
Town Crier


Joined: Feb 01, 2004
Posts: 4588

PostPosted: Wed May 03, 2006 11:09 pm    Post subject:

Someone over at oooforum.org mentioned he's working on a PDF import filter, but it's for Draw, not Writer. The reason for that is that PDF, like PostScript on which it is based, is a page description language composed of lines and curves; it has no real concept of "text".

There's also some company that has a product on the usual suspect update sites that promises to turn PDFs into Word documents; not sure how well it works, but I suppose with a lot of effort (like in Preview or Acrobat) it's possible to make an app that can figure out certain lines and curves are "text" and turn them back into "text" for copying....

Smokey

_________________
"[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Back to top
LemonAid
The Anomaly


Joined: Nov 21, 2005
Posts: 1285
Location: Witless Protection Program

PostPosted: Thu May 04, 2006 3:07 pm    Post subject:

Steveread wrote:
Actually, v7 of Adobe Reader has a Save as text option that allows you to save the text contents of the file into a text file.
Also, Acrobat has a save as rtf option, I believe.

I'll be darned. Embarassed I knew that there were some neat new features in Acrobat v7 (Reader and Professional) but missed the "Save As" to RTF. I just tried it on OOo2 Product Flyer (oo2prodflyer.pdf) and it worked like a champ! Copied just about everything I can see. Very Happy

sardisson wrote:
Someone over at oooforum.org mentioned he's working on a PDF import filter, but it's for Draw, not Writer. The reason for that is that PDF, like PostScript on which it is based, is a page description language composed of lines and curves; it has no real concept of "text".

There's also some company that has a product on the usual suspect update sites that promises to turn PDFs into Word documents; not sure how well it works, but I suppose with a lot of effort (like in Preview or Acrobat) it's possible to make an app that can figure out certain lines and curves are "text" and turn them back into "text" for copying....

Smokey

Note to all. .PDF files come in several versions.
1. Just Images, no text included. This is like a FAX or when you do a HP scan directly to .PDF. This is the lowest quality .PDF and can be the smallest file size - depending on the image resolution.

2. Image AND TEXT. Annoying dancing banana This is what you get when you created a .PDF from an Document - Like Word-to-PDF using Acrobat Professional, or Some Mac-print-to-PDF operations. This is the usual format for most docs.

2.1 This allows for you to SEARCH a .PDF document (Cntl-F or the little binoculars). The text is in the document - kinda hidden under the PostScript Image.
2.2 This is why "Spotlight" can index and Scan many .PDF documents - cool huh?

3. There is another format that has even more stuff. I don't remember all the details but the FONTs are included too (either the full font set, or just the subset of used characters). You can use Acrobat Professional to actually do limited typo corrections, or change a line of text. I do this for Business Flyers that were generated directly from the document sources like Word.

4. Since V 7, you can ever created .PDF documents that can be reviewed and commented on using the free Adobe Acrobat Reader v7+. There are many options.

An "Image plus TEXT" .PDF file can have the text and (most) RTF features extracted to a Text document. Then you can do what you want to the text! This is a BIG change in the last 1-2(1.5+ ? ) versions of Acrobat.

.PDF documents have an amazing amount of versatility that most people don't use, or have a clue that it's available. Did I mention there are many options?
BUT - It's an OUTput format - Always save your Source documents. If you can extract the text, then most of what people want is available in .PDF.

Padmavyuha wrote:
Strangely, using Paste Special after copying from ARv7 to paste into NeoOffice v2 alpha does NOT preserve font data, only point size when I try it.

- Padmavyuha

It all depends on how the .PDF document was created, and what the Author/Creator choose to have included. Most of those settings can be changed by the Author/Creator. It also depends on what application is used to create the file. The more things included (text, fonts, formats, image resolution, ....) - the bigger the .PDF files becomes.


Which type does OOo (version 1 and 2) created? I have not checked in detail but I'll wager that most have "Images & Text".


Philip (I did not know I knew so much about .PDF!?! I amazed myself! Wink )
Back to top
yoxi
Cipher


Joined: Sep 07, 2004
Posts: 1799
Location: Dawlish, Devon

PostPosted: Thu May 04, 2006 9:42 pm    Post subject:

The pdf I mentioned where Paste Special didn't preserve anything except point size was an 'Export to PDF' from NeoOffice Smile

- Padmavyuha
Back to top
gast
Guest





PostPosted: Fri May 05, 2006 4:03 am    Post subject:

Why not simply use pdftotext or pdftohtml. I use them on a regular base - totext for simple textextraction and tohtml for xml conversion. Have compiled both with no extra effort on 10.4 but I think they are available via fink or darwinports too.
regards, G.
Back to top
Display posts from previous:   
   NeoOffice Forum Index -> NeoOffice Releases All times are GMT - 7 Hours
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum

Powered by phpBB © 2001, 2005 phpBB Group

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © Planamesa Inc.
NeoOffice is a registered trademark of Planamesa Inc. and may not be used without permission.
PHP-Nuke Copyright © 2005 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.