Joined: Feb 12, 2005 Posts: 607 Location: Australia
Posted: Wed May 03, 2006 8:16 am Post subject: Opening PDF Files
I'd love to have an Office suite which opened pdf files, or imported them. I've tried saving as postrcript and all I get when I re-open them in NeoOffice 2.0 is many pages of code stuff. I tried it tonight with an article from the London Indepent titled "Is Pope poised to sanction condoms?" [catchy headline ]. Neo2 doesn't open the pdf file, but with the postscript file it opens, but just lots of pages of garbage and so forth. I did a search and coudn't find "pope" or "condom" anywhere, although my own name was there. Is that a sign, do you think?
Is there any way that NeoOffice or Abiword or Scribus could open pdf files?
I can do it in X11 with KWord, which is reasonably effective at it.
Joined: Feb 12, 2005 Posts: 607 Location: Australia
Posted: Wed May 03, 2006 8:41 am Post subject: Re: Opening PDF Files
pluby wrote:
aussie149 wrote:
Is there any way that NeoOffice or Abiword or Scribus could open pdf files?
Nope.
Patrick
OoooK, that's the end of that wish. Thanks Patrick. I won't waste any wishes on that one then
My latest workaround is to save as pdf, use Preview to save as tiff and then get my ocr program to scan the tiff file and ocr it. Then save as text, or select all copy and paste to NeoOffice. Hard work and unreliable, but what we have to do it seems.
Posted: Wed May 03, 2006 10:06 am Post subject: Re: Opening PDF Files
aussie149 wrote:
My latest workaround is to save as pdf, use Preview to save as tiff and then get my ocr program to scan the tiff file and ocr it. Then save as text, or select all copy and paste to NeoOffice. Hard work and unreliable, but what we have to do it seems.
Seems a bit roundabout if you just want to extract the text from a pdf file: open it in Adobe reader v7, choose Select All from the Edit menu, and cmd-C to copy the text (does it 1 page at a time, mind you, so a bit tiresome for multipage docs, but at least you don't have to involve ocr etc.)
- padmavyuha
*edit* Actually, Adobe Reader will select *all* text in the doc if you're in continuous view mode rather than page view mode (view-Page Layout-Continuous) - cool huh?
Joined: May 25, 2003 Posts: 4752 Location: Santa Barbara, CA
Posted: Wed May 03, 2006 10:18 am Post subject:
I know I've also directly selected text in Preview (on 10.4) to export it via the clipboard as well. It doesn't work with all PDFs, particularly those that are locked...I still don't understand why folks lock PDFs. Seems a bit silly when I can just take a screenshot of the PDF and print it out or, as mentioned above, OCR it.
Joined: Feb 12, 2005 Posts: 607 Location: Australia
Posted: Wed May 03, 2006 11:02 am Post subject:
OPENSTEP wrote:
I know I've also directly selected text in Preview (on 10.4) to export it via the clipboard as well. It doesn't work with all PDFs, particularly those that are locked...I still don't understand why folks lock PDFs. Seems a bit silly when I can just take a screenshot of the PDF and print it out or, as mentioned above, OCR it.
ed
Actually that does work. I can in some cases do a print>save as pdf, then open in Preview, select all and copy. There you go! Protected docs don't work this way. I've tried saving as postrscript, which is all they seem to let you do. That seems to work, but Preview won't then open its own postscript creations. Off topic, I realise.
Joined: Nov 21, 2005 Posts: 1285 Location: Witless Protection Program
Posted: Wed May 03, 2006 2:54 pm Post subject: Re: Opening PDF Files
yoxi wrote:
Seems a bit roundabout if you just want to extract the text from a pdf file: open it in Adobe reader v7, choose Select All from the Edit menu, and cmd-C to copy the text <snip>
*edit* Actually, Adobe Reader will select *all* text in the doc if you're in continuous view mode rather than page view mode (view-Page Layout-Continuous) - cool huh?
- padmavyuha
This exactly how we do it at work using the full Adobe Acrobat Professional package. It did not use to be so easy to copy text out so you should be happy.
Also remember to do a "Paste Special" as the Copy will have some Rich Text Format (RTF) information. Things like Fonts, Size, Bold and such that apply to text items. You do lose Page and Document formatting items.
Remember, .PDF is an OUTput format, like sending to a printer. It's meant to fully capture the visual look of an document, NOT for archiving and/or editing.
Philip ( .PDF for output, OpenDocument Format for Archival/editing )
Posted: Wed May 03, 2006 8:05 pm Post subject: opening pdf files
Actually, v7 of Adobe Reader has a Save as text option that allows you to save the text contents of the file into a text file.
Also, Acrobat has a save as rtf option, I believe.
Posted: Wed May 03, 2006 10:36 pm Post subject: Re: Opening PDF Files
LemonAid wrote:
Also remember to do a "Paste Special" as the Copy will have some Rich Text Format (RTF) information. Things like Fonts, Size, Bold and such that apply to text items.
Strangely, using Paste Special after copying from ARv7 to paste into NeoOffice v2 alpha does NOT preserve font data, only point size when I try it.
Someone over at oooforum.org mentioned he's working on a PDF import filter, but it's for Draw, not Writer. The reason for that is that PDF, like PostScript on which it is based, is a page description language composed of lines and curves; it has no real concept of "text".
There's also some company that has a product on the usual suspect update sites that promises to turn PDFs into Word documents; not sure how well it works, but I suppose with a lot of effort (like in Preview or Acrobat) it's possible to make an app that can figure out certain lines and curves are "text" and turn them back into "text" for copying....
Smokey _________________ "[...] whether the duck drinks hot chocolate or coffee is irrelevant." -- ovvldc and sardisson in the NeoWiki
Joined: Nov 21, 2005 Posts: 1285 Location: Witless Protection Program
Posted: Thu May 04, 2006 3:07 pm Post subject:
Steveread wrote:
Actually, v7 of Adobe Reader has a Save as text option that allows you to save the text contents of the file into a text file.
Also, Acrobat has a save as rtf option, I believe.
I'll be darned. I knew that there were some neat new features in Acrobat v7 (Reader and Professional) but missed the "Save As" to RTF. I just tried it on OOo2 Product Flyer (oo2prodflyer.pdf) and it worked like a champ! Copied just about everything I can see.
sardisson wrote:
Someone over at oooforum.org mentioned he's working on a PDF import filter, but it's for Draw, not Writer. The reason for that is that PDF, like PostScript on which it is based, is a page description language composed of lines and curves; it has no real concept of "text".
There's also some company that has a product on the usual suspect update sites that promises to turn PDFs into Word documents; not sure how well it works, but I suppose with a lot of effort (like in Preview or Acrobat) it's possible to make an app that can figure out certain lines and curves are "text" and turn them back into "text" for copying....
Smokey
Note to all. .PDF files come in several versions.
1. Just Images, no text included. This is like a FAX or when you do a HP scan directly to .PDF. This is the lowest quality .PDF and can be the smallest file size - depending on the image resolution.
2. Image AND TEXT. This is what you get when you created a .PDF from an Document - Like Word-to-PDF using Acrobat Professional, or Some Mac-print-to-PDF operations. This is the usual format for most docs.
2.1 This allows for you to SEARCH a .PDF document (Cntl-F or the little binoculars). The text is in the document - kinda hidden under the PostScript Image.
2.2 This is why "Spotlight" can index and Scan many .PDF documents - cool huh?
3. There is another format that has even more stuff. I don't remember all the details but the FONTs are included too (either the full font set, or just the subset of used characters). You can use Acrobat Professional to actually do limited typo corrections, or change a line of text. I do this for Business Flyers that were generated directly from the document sources like Word.
4. Since V 7, you can ever created .PDF documents that can be reviewed and commented on using the free Adobe Acrobat Reader v7+. There are many options.
An "Image plus TEXT" .PDF file can have the text and (most) RTF features extracted to a Text document. Then you can do what you want to the text! This is a BIG change in the last 1-2(1.5+ ? ) versions of Acrobat.
.PDF documents have an amazing amount of versatility that most people don't use, or have a clue that it's available. Did I mention there are many options?
BUT - It's an OUTput format - Always save your Source documents. If you can extract the text, then most of what people want is available in .PDF.
Padmavyuha wrote:
Strangely, using Paste Special after copying from ARv7 to paste into NeoOffice v2 alpha does NOT preserve font data, only point size when I try it.
- Padmavyuha
It all depends on how the .PDF document was created, and what the Author/Creator choose to have included. Most of those settings can be changed by the Author/Creator. It also depends on what application is used to create the file. The more things included (text, fonts, formats, image resolution, ....) - the bigger the .PDF files becomes.
Which type does OOo (version 1 and 2) created? I have not checked in detail but I'll wager that most have "Images & Text".
Philip (I did not know I knew so much about .PDF!?! I amazed myself! )
Why not simply use pdftotext or pdftohtml. I use them on a regular base - totext for simple textextraction and tohtml for xml conversion. Have compiled both with no extra effort on 10.4 but I think they are available via fink or darwinports too.
regards, G.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum