
WebArch
Download and save a local copy of a website.
- Crawl a website, save all files locally.
- Very simple interface.
- Runs locally, not a cloud service. Own your own data.
- Options to preserve all files or to process them to allow browsing of the local copy.
- For more functionality including building an archive over time and being alerted to changes, see Website Watchman.
Category:Developer Tools / Web Scrapers
Licence
This application is currently absolutely free.
System Requirements
Current version requires Mac OS 10.14 or higher.
Support, questions, requests
If, after using the 'process' option, parts of the page aren't visible, this may be the problem.
Please use the Website Watchman support form here
What should I do with the downloaded file?
Open the .dmg file and find the application inside. If you want to keep using WebArch, drag and drop it into your Applications folder. To keep it in your dock, right-click or click-and-hold on its dock icon and choose 'Keep in dock'.
Version History
Version 0.6.0 released October 2022
- Some improvements which help to avoid missing images (where the original image file has spaces or other characters that need encoding).
Version 0.5.0 released December 2021
- Some improvements which help to avoid missing resources:
- .js files are now searched for references to images
- the page is more aggressively searched for references to images and linked files, particularly within blocks of javascript.
- Now built to run natively on Intel and Apple Silicon Macs
- Inherits more improvements to the Integrity crawling engine made in recent months.
Version 0.4.0 released December 2020
- Improvements to engine meaning that certain sites will display properly locally after being saved with the 'process' option.
- Updates the selectable user-agent strings and adds more (in particular, Edge and some more mobile browsers)
- Updates Paddle's licensing framework to the latest Big Sur/M1 compatible version
- Changes default setting for treating http:// links on the same domain (when starting with an https:// url). Now treats them as internal, which is probably what's expected.
Version 0.3.0 released August 2020
- Improvements to crawling engine, particularly with regard to image discovery; now finds and processes image urls within inline styles.
Version 0.2.1 released May 2020
- Adds option to archive all files from a website (where the url is discovered). The new switch is added to Preferences. The default behaviour is as before, only certain files are archived; pdf and those that are part of the page, css, js, images. Beware - turning the filter off can result in downloading files that may be large such as zip and audio / video.
Version 0.2.0 released Dec 2019
- adds 'single page' option
- updates the Integrity crawling engine to the latest version thus inheriting a number of recent improvements
- removes 'unexpectedly few results' warning and the preference checkbox to suppress that warning, as single page mode produces only a single page. Plus user may be scanning a deep page where a single page is expected, or a single page website or a page that has no links.
- Adds a mobile browser to the preset UA strings
Version 0.1.4 released Dec 2019
- Important fix for all users. Fixes a problem related to string encoding which could have stopped pages from appearing in the archive browser (it would have appeared blank) and stopped WW from recognising changes in pages if those pages had an unusual text encoding or contained non-compliant characters
Version 0.1.3 released Dec 2019
- Fixes a problem causing certain js files to be skipped
- Fixes a problem which could cause spurious information to appear in the page content
Version 0.1.2 released Oct 2019
- Adds warning / advice and offer of support if results don't appear as expected
- Adds support button to About / Help box
Version 0.1.1 released Sep 2019
- First public release