The crawl diff feature lets you compare any twoDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt
Use this file to discover all available pages before exploring further.
Crawl objects and get a structured view of everything that changed between them. It is designed for crawl-over-crawl monitoring: weekly checks, pre/post-deploy audits, or migration QA.
Basic usage
examples/crawl_diff.py shows the full pattern:
compare method
compare on the newer crawl and pass the older crawl as other. The diff is always expressed as changes from other to self.
Parameters
title_fields
title_fields
List of field names to treat as the page title for comparison. Defaults to
["Title 1"]. Override if your crawl exports a custom title extraction.redirect_fields
redirect_fields
List of field names that carry the redirect destination URL. Defaults to a sensible built-in list. Override when your export uses non-standard column names.
redirect_type_fields
redirect_type_fields
List of field names that carry the redirect type (e.g., 301, 302). Override alongside
redirect_fields when using non-standard exports.field_groups
field_groups
List of field group names to include in the comparison. Override to narrow or expand what is diffed.Default groups:
canonical, meta description, meta keywords, meta refresh, h1, h2, h3, word count, indexability, robots.compare uses a DuckDB-first projection path for its internal field set. On lean caches it only pulls the fields required for diffing, not full internal_all rows.CrawlDiff object
compare returns a CrawlDiff object with the following attributes and methods.
Change buckets
| Attribute | Type | Description |
|---|---|---|
added_pages | list | URLs present in the new crawl but not the old one. |
removed_pages | list | URLs present in the old crawl but not the new one. |
status_changes | list[StatusChange] | Pages whose HTTP status code changed. |
title_changes | list[TitleChange] | Pages whose title field value changed. |
redirect_changes | list[RedirectChange] | Pages whose redirect destination changed. |
field_changes | list[FieldChange] | All other field-level changes (canonical, meta, headings, word count, indexability, robots). |
StatusChange objects
Each item in diff.status_changes is a StatusChange with three fields:
FieldChange objects
Each item in diff.field_changes is a FieldChange:
.summary()
Returns a dict with change counts across all buckets.
.to_rows()
Flattens all change buckets into a single list of dicts — useful for export, CSV writing, or bulk dataframe construction.
.to_pandas() / .to_polars()
Convert the flattened diff to a pandas DataFrame or polars DataFrame directly.
to_pandas() and to_polars() require pandas or polars to be installed. They are optional dependencies not included in the base install.Change types tracked
The following change types are captured by default:Status changes
Status changes
HTTP status code changes between the two crawls (e.g., 200 → 404, 301 → 200).
Title changes
Title changes
Changes to the
Title 1 field (or custom title_fields). Detects additions, removals, and rewrites.Redirect changes
Redirect changes
Changes to the redirect destination URL or redirect type. Best-effort and depends on the columns available in the export.
Canonical changes
Canonical changes
Changes to
Canonical Link Element 1 and Canonical Link Element 1 Status.Meta description / keywords / refresh
Meta description / keywords / refresh
Changes to
Meta Description 1, Meta Keywords 1, and Meta Refresh.H1 / H2 / H3
H1 / H2 / H3
Changes to the primary heading fields (
H1-1, H2-1, H3-1).Word count
Word count
Changes to the
Word Count field.Indexability
Indexability
Changes to
Indexability or Indexability Status.Robots and directives
Robots and directives
Changes to
Meta Robots 1, X-Robots-Tag 1, and the robots directives summary.Filtering diff results
Because each change bucket is a plain Python list, you can filter with standard list comprehensions:Narrowing the diff scope
Usefield_groups to reduce the comparison to only the fields you care about: