Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Call crawl.compare(other) to compare two Crawl instances. The returned CrawlDiff holds four change buckets (status, title, redirect, field) plus lists of added and removed pages.
from screamingfrog import Crawl

old = Crawl.load("./crawl-2024-01.dbseospider")
new = Crawl.load("./crawl-2024-02.dbseospider")

diff = new.compare(old)

print(diff.summary())
# {'added_pages': 12, 'removed_pages': 3, 'status_changes': 7, ...}

for change in diff.status_changes:
    print(change.url, change.old_status, "->", change.new_status)

rows = diff.to_rows()  # flat list of dicts for export
df = diff.to_pandas()

CrawlDiff

Returned by crawl.compare(other, ...). Frozen dataclass.

Fields

added_pages
list[str]
URLs present in the new crawl but not in the old crawl.
removed_pages
list[str]
URLs present in the old crawl but not in the new crawl.
status_changes
list[StatusChange]
Pages whose HTTP status code changed between the two crawls.
title_changes
list[TitleChange]
Pages whose title changed.
redirect_changes
list[RedirectChange]
Pages whose redirect target or redirect type changed.
field_changes
list[FieldChange]
Pages where one or more of the tracked field groups changed (canonical, meta description, H1-3, word count, indexability, robots directives, etc.).

Methods

.summary()dict[str, int]

Return counts for each change bucket plus a total_changes key.
print(diff.summary())
# {
#   'added_pages': 12,
#   'removed_pages': 3,
#   'status_changes': 7,
#   'title_changes': 22,
#   'redirect_changes': 4,
#   'field_changes': 31,
#   'total_changes': 79,
# }

.to_rows()list[dict[str, Any]]

Flatten all change buckets into a single list of dicts, each with a change_type key. Each row has at minimum:
KeyDescription
change_typeOne of added_page, removed_page, status_change, title_change, redirect_change, field_change
urlThe affected URL
Additional keys per change_type:
  • field: "Status Code"
  • old_value: previous status code
  • new_value: new status code
  • field: "Title 1"
  • old_value: previous title
  • new_value: new title
  • field: "Redirect URL"
  • old_value: previous redirect target
  • new_value: new redirect target
  • old_type: previous redirect type
  • new_type: new redirect type
  • field: the field group name (e.g. "Canonical", "H1-1", "Meta Description")
  • old_value: previous value
  • new_value: new value

.to_pandas() / .to_polars()

Return diff.to_rows() as a pandas or Polars DataFrame. Requires the respective library to be installed.

compare() parameters

These parameters on crawl.compare() control which fields are diffed.
other
Crawl
required
The baseline crawl to compare against.
title_fields
Sequence[str] | None
default:"None"
Column name candidates for title comparison. Compared left-to-right; the first present value wins. Defaults to ("Title 1", "Title").
redirect_fields
Sequence[str] | None
default:"None"
Column name candidates for redirect URL comparison. Defaults to ("Redirect URL", "Redirect URI", "Redirect Destination").
redirect_type_fields
Sequence[str] | None
default:"None"
Column name candidates for redirect type comparison. Defaults to ("Redirect Type",).
field_groups
dict[str, Sequence[str]] | None
default:"None"
A mapping of field group label → candidate column names. Each group produces FieldChange entries. Pass None to use the default groups.Default field groups:
  • CanonicalCanonical Link Element 1, Canonical Link Element, …
  • Canonical Status
  • Meta DescriptionMeta Description 1, …
  • Meta KeywordsMeta Keywords 1, …
  • Meta RefreshMeta Refresh 1, …
  • H1-1, H2-1, H3-1
  • Word Count
  • Indexability, Indexability Status
  • Meta Robots, X-Robots-Tag
  • Directives Summary

StatusChange

Frozen dataclass. Represents a page whose HTTP status code changed.
url
str
The page URL.
old_status
int | None
Previous HTTP status code.
new_status
int | None
New HTTP status code.

TitleChange

Frozen dataclass. Represents a page whose title changed.
url
str
The page URL.
old_title
str | None
Previous title value.
new_title
str | None
New title value.

RedirectChange

Frozen dataclass. Represents a page whose redirect target or type changed.
url
str
The page URL.
old_target
str | None
Previous redirect destination URL.
new_target
str | None
New redirect destination URL.
old_type
str | None
Previous redirect type (e.g. "301").
new_type
str | None
New redirect type.

FieldChange

Frozen dataclass. Represents a change in any tracked field group.
url
str
The page URL.
field
str
The field group label (e.g. "Canonical", "H1-1", "Meta Description").
old_value
str | None
Previous value of the field.
new_value
str | None
New value of the field.