Capturing a Web Site using Acrobat

The full version of Adobe Acrobat has the ability to capture an entire web site straight off of the web and create a single PDF file that contains most of the content of that web site. The PDF file contains copies of all of the html web pages from the site with all of the internal links intact and still transferring you to the page of the PDF corresponding to the web page that the original link would have transferred you to. What you don't get in the PDF is that any downloadable files in formats that Acrobat doesn't understand (such as zip files) don't get included in the document. Also Acrobat 4 (I don't have version 5 yet) does not understand stylesheets or scripting languages such as javascript so these elements of the site are also ignored.

Capturing such a copy of a web site gives you a complete copy of the site that is completely independent of the internet. Obtaining such a download of your own site also has other advantages such as

this makes it useful to obtain such a copy of your own site even if you are going to delete it again after you finish checking all of the pages.

To download an entire web site into a PDF all you need to do is to connect to the internet and then run Acrobat where you select Open Web Page from the File menu. You then enter the URL (web address) of the site that you want to download, select the Get Entire Site radio button, and then press the Download button. You may then be asked if you want to proceed. Answering Yes will then start the download process.

As you can see, creating a PDF copy of a web site is a fairly simple process but it can be rather time consuming particularly if you connect to the internet via a dial up link. If you are trying to convert your own site then you may prefer to speed up the process by installing your own web server so that you can "download" the pdf from your master copy of the site on your own computer. The web page download process requires a URL and it wont work with just specifying a directory on your drive so installing your own web server is the only way to avoid having to be connected. Note that if you do run it this way without being connected then any images or other files embedded in your site from external servers will not be incorporated into the final document and those servers will be reported as not found.

Note that before downloading someone else's web site into a PDF you should first obtain the site owner's permission to do so as otherwise you may find yourself in breach of their copyright. Also installing a web server on your own computer wont reduce the download time for sites where you don't already have a copy on your own computer.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow