WebArch

Support | System Requirements | Version history

Download and save a local copy of a website.

Screenshots

v0.6.0 Free

Crawl a website, save all files locally.
Very simple interface.
Runs locally, not a cloud service. Own your own data.
Options to preserve all files or to process them to allow browsing of the local copy.
For more functionality including building an archive over time and being alerted to changes, see Website Watchman.

Category:Developer Tools / Web Scrapers

Licence

This application is currently absolutely free.

End user licence agreement

System Requirements

Current version requires Mac OS 10.14 or higher.

Support, questions, requests

If, after using the 'process' option, parts of the page aren't visible, this may be the problem.

Please use the Website Watchman support form here

What should I do with the downloaded file?

Open the .dmg file and find the application inside. If you want to keep using WebArch, drag and drop it into your Applications folder. To keep it in your dock, right-click or click-and-hold on its dock icon and choose 'Keep in dock'.

Version History

Version 0.6.0 released October 2022

Some improvements which help to avoid missing images (where the original image file has spaces or other characters that need encoding).

Version 0.5.0 released December 2021

Some improvements which help to avoid missing resources:
- .js files are now searched for references to images
- the page is more aggressively searched for references to images and linked files, particularly within blocks of javascript.
Now built to run natively on Intel and Apple Silicon Macs
Inherits more improvements to the Integrity crawling engine made in recent months.

Version 0.4.0 released December 2020

Improvements to engine meaning that certain sites will display properly locally after being saved with the 'process' option.
Updates the selectable user-agent strings and adds more (in particular, Edge and some more mobile browsers)
Updates Paddle's licensing framework to the latest Big Sur/M1 compatible version
Changes default setting for treating http:// links on the same domain (when starting with an https:// url). Now treats them as internal, which is probably what's expected.

Version 0.3.0 released August 2020

Improvements to crawling engine, particularly with regard to image discovery; now finds and processes image urls within inline styles.

Version 0.2.1 released May 2020

Adds option to archive all files from a website (where the url is discovered). The new switch is added to Preferences. The default behaviour is as before, only certain files are archived; pdf and those that are part of the page, css, js, images. Beware - turning the filter off can result in downloading files that may be large such as zip and audio / video.

Version 0.2.0 released Dec 2019

adds 'single page' option
updates the Integrity crawling engine to the latest version thus inheriting a number of recent improvements
removes 'unexpectedly few results' warning and the preference checkbox to suppress that warning, as single page mode produces only a single page. Plus user may be scanning a deep page where a single page is expected, or a single page website or a page that has no links.
Adds a mobile browser to the preset UA strings

Version 0.1.4 released Dec 2019

Important fix for all users. Fixes a problem related to string encoding which could have stopped pages from appearing in the archive browser (it would have appeared blank) and stopped WW from recognising changes in pages if those pages had an unusual text encoding or contained non-compliant characters

Version 0.1.3 released Dec 2019

Fixes a problem causing certain js files to be skipped
Fixes a problem which could cause spurious information to appear in the page content

Version 0.1.2 released Oct 2019

Adds warning / advice and offer of support if results don't appear as expected
Adds support button to About / Help box

Version 0.1.1 released Sep 2019

First public release