Copy Text from a PDF while Preserving the Formatting
http://www.howtogeek.com/geekers/up/...fe72ddead4.jpg
PDF, the ubiquitous document format, is great for sharing documents while preserving fonts, images, and the general layout across platforms. Is there an easy way, however, to preserve that very formatting when copying and pasting text out of the document? Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites. The Question SuperUser reader Colen is searching for a way to extract text from PDFs while preserving the formatting: When I copy text out of a PDF file and into a text editor, it ends up mangled in a variety of ways. Formatting like bold and italics are lost; soft line breaks within a paragraph of text are converted to hard line breaks; dashes to break a word over two lines are preserved even when they shouldn’t be; and single and double quotes are replaced with ? signs.Is there a quick and easy way for Colen (and the rest of us) to get grab text without sacrificing the formatting? The Answer SuperUser contributor Frabjous offers a solution combined with a heavy dose of caution: Firstly, you have to understand what a PDF is. PDFs are designed to mimic a printed page, and they are designed only as an output format, not an input format. a PDF is basically a map containing the exact location of characters (individual letters or punctuation, etc.) or images. In most cases, a PDF does not even store information about where one word ends and another begins, much less things like soft breaks vs. hard breaks for paragraph endings.If you are having trouble deciding which tool to start with, Calibre is a veritable document Swiss Army knife. You can also use it to convert PDF files for use on your ebook reader and organize your ebook/document library. |
All times are GMT. The time now is 01:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
SEO by vBSEO 3.5.2