Thread: pdf to word converter (not online)

  1. #1
    SAMARAS std10093's Avatar
    Join Date
    Jan 2011
    Location
    Nice, France
    Posts
    2,694

    pdf to word converter (not online)

    I am interesting into a pdf to word converter. Well actually my dad needs that He wants to have it downloaded in his pc and he wants something free of charge. Any suggestions?
    Code - functions and small libraries I use


    It’s 2014 and I still use printf() for debugging.


    "Programs must be written for people to read, and only incidentally for machines to execute. " —Harold Abelson

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    Do you have "Word"? (This is not as weird as it seems; a lot of applications can read "Word" documents.)

    If so, use "Calibre" to go from PDF to RTF with "Word" to go from RTF to your chosen "Word" format.

    If not, download "Libre Office" and a PDF import module.

    That's the only processes I'd recommend, but even then, upwards of over half the documents you try and convert will lose formatting to the point of illegibility. (I'm saying half based on my experience.)

    I'd love to know why he thinks he needs some a tool!?

    If this is about viewing and he doesn't like his PDF viewer, get a different one; there are a lot of open source and free PDF viewers.

    If this is about editing one or two, tell him you'll do it and just use an online tool.

    If this is about editing something that will be repeated, the multistage loss of formatting will be so costly with many PDF files that it would be faster to recreate the layout from scratch. (You can still use a tool like "Calibre" to rip the text and images.)

    By the by, conversions between two formats with layout markup almost always suck.

    Soma

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by phantomotap View Post
    upwards of over half the documents you try and convert will lose formatting to the point of illegibility.
    The quality of most text processors is unbelievably poor. Before I went down the dark path of signal processing, I was a document guru. The reason so many PDF files fail to convert properly is due to a combination of poor generation of PDF and stupid coding of the conversion tools. Within a PDF document, the contents of each page are described using a PostScript-like (NOT PostScript) language, among whose many commands are commands for positioning and rendering of glyph sequences. For reasons that I was never able to discern over the period of a decade, some PDF generators will place text on the page in an almost completely random order. As in, imagine the words of this paragraph being laid down character by character, but not left-to-right, more like insano-mode.

    Other things the poor, downtrodden PDF converter may run into, are a mixture of fonts and images together to form the text -- seemingly at random. Guess what, that means you need OCR -- artificial intelligence, basically -- to convert the document. Or, the use of multiple fonts, each of which contain different subsets of characters, even when only a single font was used. Or the use of custom encodings for absolutely no reason other than it made some schmuck programmer's life a miserable fraction easier.

    On the conversion side, many converters won't even take the simplest of steps to try to undo this mess, such as organizing and sorting glyphs in topological order to recover the line and paragraph structures. It's a freakin' mess, and it's why I maintain that PDF is a print format, not a document format. None of this stuff matters for print, but for document processing it is a nightmare.

    Note that none of these problems are intrinsically the fault of PDF. PDF has some weird crap in it, like JavaScript, but the fundamentals are solid, and in fact, PDF makes a not-too-bad object persistence format. That's right, I just said that PDF is an okay choice for an object-relational database.
    Last edited by brewbuck; 03-16-2013 at 10:04 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    SAMARAS std10093's Avatar
    Join Date
    Jan 2011
    Location
    Nice, France
    Posts
    2,694
    He wants to edit them. We have "Word" (the one with .doc - and the pack that converts .docx to .doc and in reverse), but I do not see how can this be helpful...??

    I see your other points.
    Code - functions and small libraries I use


    It’s 2014 and I still use printf() for debugging.


    "Programs must be written for people to read, and only incidentally for machines to execute. " —Harold Abelson

  5. #5
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by std10093 View Post
    He wants to edit them.
    Adobe Acrobat (not free) or similar let's you edit a pdf document. It's probably better than trying to convert it to Word imho.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 13
    Last Post: 09-20-2012, 11:48 AM
  2. Replies: 28
    Last Post: 10-23-2011, 07:17 PM
  3. reading text-and-numbers file word by word
    By bored_guy in forum C Programming
    Replies: 22
    Last Post: 10-26-2009, 10:59 PM
  4. open file, search of word, replace word with another
    By Unregistered in forum C++ Programming
    Replies: 0
    Last Post: 06-05-2002, 01:16 PM
  5. funky little program - word to morse converter!
    By Brian in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 01-18-2002, 04:02 PM