Integrity can generate and ftp an XML sitemap conforming to the standard protocol for submission to search engines. This information ensures that they know about all of your pages.
To export, press the 'Export XML Sitemap' button or find the option under the File menu:
Each of your pages is contained in the sitemap if it meets the criteria (see below 'Which pages are included?')
For each page, the file contains a change frequency and a priority. This information tells the search engine about the priority that you give to your page and how often you are likely to update it. They may or may not act on this information.
If present, priority must be a value beetween 0.0 and 1.0. I suggest that you mark your home page as 1.0, one important page as 0.9, slightly less important pages as 0.5 and then decreasing from there.
If you choose the 'Automatic' setting for priority, Integrity will mark your starting url as 1.0 and then calculate the others based on the number of clicks from the home page, and use a logarithmic scale. ie one click from first page = 0.5, two = 0.3, three = 0.2 with all other pages = 0.2
From v5.9.13, the priority allocated to each page appears in the Sitemap table (previously the distance from home, which the priority is calculated from). The priority and change frequency can be edited within the table for individual pages (a rule will be set up to apply to the page in question) and any edits will be remembered for future scans.
Further to this, you can set up some 'rules' to specify priority and update frequency for certain pages or sections of your site. You don't need to enter a full url - as per the black and white lists on the front settings screen, you only need to enter a partial url. This way it is possible to specify a particular url or a section of the site, eg "/engineering/" If multiple rules could apply to the same page, only one rule will apply, and which one is used will be unpredictable.
Depending on your version of Integrity or Integrity Plus, you may see a 'match' column. This checkbox means 'make an exact match', so if you've typed a partial url, leave this unchecked. If you've typed an entire url, check the box.
Which pages are included?
The SEO table will include pages that are:
- internal (urls with a subdomain may or may not be treated as internal depending on whether the preference is checked in Preferences > General)
- status must be good (ie urls with status 0, 4xx or 5xx will be excluded)
- not excluded by your blacklist / whitelist rules on the Settings screen
- your robots.txt file will be observed if that preference is checked on the Settings screen (below Blacklist and Whitelist rules)
The sitemap table will include a subset of those pages - in addition to the above, the following rules apply:
- will include pdfs (if that preference is checked in Preferences > Sitemap)
- not excluded by your robots.txt file (if that box is checked in Preferences > Sitemap)
- not excluded by a 'robots noindex' meta tag (if that box is checked in Preferences > Sitemap)
- does not have a canonical tag that points to a different url
Updating by ftp
In Preferences > Sitemap you can choose whether the exported file is saved locally, saved and then ftp'd or just ftp'd.
If you've chosen to ftp you'll see the ftp dialog. Integrity will remember these details for future use with the site's settings (if you've already saved a set of settings for this site.)
Visualisation images are fun and can also be useful for spotting pages which are 'out on a limb' or not very well connected.
Integrity can export your sitemap as a .dot file which can be opened by graphing applications such as Omnigraffle or my own SiteViz.