Monitor, archive, go back in time.
- Monitor a whole website, part of a website or a single page
- Set up configurations for multiple sites / pages
- Schedule hourly, daily, weekly, monthly scan
- Be alerted to any changes, visible text, source code or changes to the page's resources
- Be able to demonstrate what a page looked like on a particular date
- Be aware of every change to a competitor's page / site
- Runs locally, not a cloud service. Own your own data.
- An archive is kept*, including all changes to pages, images, style sheets and js
- View a 'living' version of a historical page, not a screenshot
- Switch between versions of the page to compare them
- Export a historical page as image or collection of all of its files
- Export the entire site, preserving all files as they were on a given date, or processed to make a browsable local copy of the site.
* The archive is internal and in a proprietary format. It has to be that way in order to save changes over time. You can however export all files for a single page, or export all files for the entire site for a given date.
If you are interested in a 'one shot' crawl-and-save then see WebArch which is a free app that does this one job with a very simple interface.
Category:Developer Tools / Web Scrapers
Version 2.x offers a free 30-day trial. At the end of this period or before, a licence can be purchased which is a one-off lifetime licence for Website Watchman. As with our other titles, upgrades (even major version numbers) are usually free but we reserve the right to charge an upgrade fee for very major new versions.
Current version requires Mac OS 10.12 or higher. Users of earlier systems should continue using v2.5.5
Support, questions, requests
What should I do with the downloaded file?
Open the .dmg file and find the application inside. If you want to keep using Website Watchman, drag and drop it into your Applications folder. To keep it in your dock, right-click or click-and-hold on its dock icon and choose 'Keep in dock'.
Once you've set up scheduled scan(s) then there's no need to have the app running, it'll start when needed.
Version 2.9.1 released August 2020
- Improvements to crawling engine, particularly with regard to image discovery; now finds and processes image urls within inline styles.
Version 2.9.1 released May 2020
- Now able to archive all files from a website (where the url is discovered). The new switch is added to the 'Filtering' dialogue. The default behaviour is as before, only certain files are archived; pdf and those that are part of the page, css, js, images. Beware - turning the filter off can result in downloading files that may be large such as zip and audio / video.
- Various small fixes including: Some pages wouldn't display properly in the archive browser or after export with processing, if they had a base href giving an absolute base address. This is correctly now removed in those situations.
Version 2.9.0 released May 2020
- Important fix for everyone: Fixes an issue where changes to a configuration may not be saved if the user then switches to another configuration.
- Adds queuing to the scheduling functionality. If a scheduled scan tries to start while a scan is already running, previously it would have bailed. Now it's queued and starts when the existing scan finishes. If you have modal alerts switched on, this is suppressed at the end of a scan if there is a queued scan.
- If you use the render page functionality, note also the new 'additional render time' setting in Preferences. After the page itself has loaded, it may need more time to load in dynamic content. if your archived pages don't contain the dynamic content, try increasing this setting.
- Adds a mobile browser to the preset UA strings
Version 2.7.0 released April 2020Some interface fixes:
- In light mode, the sidebar background remained dark and the dark text wasn't easy to read
- In text fields, editing operations such as paste / undo etc weren't working
- Other minor fixes
Version 2.6.1 released April 2020
- Fixes problem with archive browser not displaying page properly / css/js files not being archived properly if 'this page only' option used.
Version 2.6.0 (beta) released January 2020
- Adds ability to sort website configs into folders
- Change to the website deletion. Was giving multiple alerts because the button deletes the webarchive for the site as well as the configuration. User used to be able to cancel either of these operations and continue with the other. However, this was confusing to users. Now a single warning is given and both things are deleted. (in fact three, because deleting the config also deletes the schedule from the system, if set up.)
Version 2.5.6 released January 2020
- Updates the browser used for login / captcha when the authentication option is used. Images are now loaded
- Min requirement increased to 10.12. Users of earlier systems should continue using v2.5.5
Version 2.5.5 released December 2019
- Important fix for all users. Fixes a problem related to string encoding which could have stopped pages from appearing in the archive browser (it would have appeared blank) and stopped WW from recognising changes in pages if those pages had an unusual text encoding or contained non-compliant characters
Version 2.5.4 released December 2019
- Correctly pads single-figure hours / minutes in schedules overview table
- webarchive overview sortable by name or size
- Adds export to CSV from Changes table
- Adds sorting by URL to Changes table
Version 2.5.3 released December 2019
- Fixes a problem causing certain js files to be skipped
- Fixes a problem which could cause spurious information to appear in the page content
- Using the [-] button below the website config list now also deletes the archive, which users expected to happen. There is an 'are you sure' on both parts of the operation (deleting the config and deleting the archive) so either can be cancelled
- Fixes a problem with the archive deletion which could sometimes leave behind the empty container which would remain in the archive overview list as 'unable to find config'
Version 2.5.2 released October 2019
- Suppresses 'few results' warning if 'single page' is checked
Version 2.5.1 released October 2019
- Adds warning / advice and offer of support if results don't appear as expected
- Adds support button to About / Help box
Version 2.5.0 released September 2019
- By popular demand, now has the option to exporting a local copy of the entire site, either the most recent scan or as the site stood on any date that a scan was made.
- Has option to preserve files as they were retrieved under their original filenames, or process the files so that they can be browsed locally
Version 2.4.4 released August 2019
- Removes some unnecessary items from the main menu bar
- Adds information alert when user chooses to export from the browser but has nothing selected.
Version 2.4.3 released July 2019
- Fixes an issue with certain Wordpress themes causing the archive browser to display the pages without styles or not at all. If you have been affected by this bug, you may need to either delete the existing archive for the site and start afresh, or (if you want to keep the files already archived) create a new config for the same website, giving it a different name. Contact support if you need help with this.
- Small change to parsing engine, fixing an issue causing some spurious urls to appear for some sites
Version 2.4.2 released July 2019
- Small modification to workflow. If a previously-known url is unavailable (eg 404) for a second time, then it's now ignored.
Version 2.4.1 released June 2019
- Adds [Cancel] button to the dialog which appears when you try to edit the address bar. A small but useful addition. If a scheduled scan has started while you were typing and you were in UI mode, then neither of the previous options are useful when you need to just cancel, leaving the address bar as it was.
Version 2.4 released June 2019
- Adds a webarchive overview tool (View > Webarchive Overview). Shows a list of the webarchive files and file sizes. Allows user to monitor disk space being used by Watchman for these archives. This tool includes a delete button.
- Fixes some issues experienced with the schedule launch agents
- Fixes problem with the 'delete' button in the Schedules Overview window
- When a set of settings is deleted, the schedule launchagent is now unloaded and deleted before the settings are deleted
- Fixes status bar icon sometimes not being visible
- Correctly shows 'Filters are in place' message if filters are in place for selected config when app is re-started
Version 2.3 released June 2019
- Important fix - css and js files which have querystring (unusual but valid) were not being archived. (They would still have been referenced by the archived page as long as their urls were absolute)
- Inherits some improvements from version 9 of the crawling engine
Version 2.2.3 released May 2019
- Bug fix - that could cause a crash during the scan under certain circumstances
Version 2.2.2 released April 2019
- Important bug fix - bug could have prevented pages from displaying properly in the archive browser.
Version 2.2.1 released April 2019
- Adds option to log into the site and scan the site using the authentication. This isn't guaranteed to work for every site that requires authentication because different sites authenticate the user in different ways, but is a method that is likely to work for many sites. If it does work, it may last for a limited time, eg one session.
Version 2.1.1 released April 2019
- Makes 'view source' in the archive browser not editable, and adds a find bar
- Enables 'hardened runtime' and has been notarized by Apple
Version 2.1.0 released March 2019
- Name altered slightly from Watchman to Website Watchman because of a conflict with an existing software package.
- If main window is showing when a scheduled scan kicks in, window switches to sites and settings view (which contains the scan progress bar) so that it's obvious why the window has become active.
- Adds 'No alert' to the alert option, so that Website Watchman can archive a site regularly without alerting changes. (Non-UI mode needs to be on too if you want to be completely uninterrupted).
Version 2.0.2 released March 2019
- Adds custom About box which shows licence key / holder, version / build and active link to Website Watchman homepage
- Correctly removes purchase menu item from Help menu when product is licensed
Version 2.0.1 released March 2019
- A few minor fixes.
- dmg contains free app WebArchive Viewer which will open and allow browsing / exporting from the webarchive files that Website Watchman creates.
- Licensing switched on - trial period 30 days.
Version 2.0.0 released March 2019
- Adds 'Non-UI mode' which has a status bar menu, but no dock icon or menu bar. If Website Watchman is already running (tip: add it to your startup items) then when a scheduled scan starts, it will do so without interrupting the user. Unless it has anything to report at the end of the scan and the alert option is switched on.
- Makes some improvements to the archive browser. It's now possible to click internal links within the browser window, and as long as the target page exists in the same archive for the same time and date, then the link will work.
- Also adds url and date info fields above the archive browser window.
Version 1.1.5 released Feb 2019
- Adds informative message if you click to make a comparison for a resource change, and that resource has no previous version (ie it is a new resource at that point rather than one that's changed.)
- Adds informative text to the settings tab indicating whether any filters are in place
Version 1.1.4 released Jan 2019
- Improvements to highlighting functionality
- Adds option to display source code in the archive browser
- Other small fixes
Version 1.1.3 released Jan 2019
- Adds some options for 'filtering' pages before comparing. Useful it all pages change daily because of something automatic like a feed of recent blog posts. In that case you don't want to archive a new copy of every page and be warned that every page has changed.
- you can exclude a div or span with a certain class or id, or ignore everything except the contents of a certain class or id. If things change within the source code of your page (such as an ever-changing form id or email address obscurer) then you can now totally ignore the source and only check visible text.
- you can use regex to filter out a session hash / id or similar - a number or id in the source which changes every time the page is served
- you can exclude the source entirely and only check the visible text for changes (this is in addition to the option to only be *alerted* to visible text changes)
Version 1.0.2 released Jan 2019
- Adds some options for alerting when visible text changes. You can choose whether visible text includes html5 header, footer, nav and image alt text.
Version 1.0.1 released Jan 2019
- Corrections to Help menu, adds link to support form, removes 'purchase' option as app is currently free.
Version 1.0.0 released Jan 2019
- Out of beta, free for a limited time
Version 0.4.0 released Jan 2019
- Some user interface work
Version 0.3.0 released Dec 2018
- Adds server response check, optionally alerting when response code changes (in order for an alert to be given for change of status, the page must have been first discovered or a change detected on the page using a version later than 0.3)
- Adds a bunch of options for when alert is to be shown
- Other small changes
Version 0.2.0 released Dec 2018
- adds user-agent string picker to Preferences, with a variety of standard UA strings, or the option to paste anything else.
- some changes to the context help (the 'i' buttons)
- inherits some minor improvements to the Integrity crawling engine
Version 0.1 released Dec 2018
- First public release