Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt

Use this file to discover all available pages before exploring further.

Class constructors

All constructors are class methods on Crawl. Use Crawl.load() for auto-detected loading, or the named constructors for explicit control.

Crawl.load()

Auto-detect and load a crawl from any supported source.
from screamingfrog import Crawl

crawl = Crawl.load("./exports")          # CSV exports directory
crawl = Crawl.load("./crawl.db")         # SQLite database
crawl = Crawl.load("./crawl.duckdb")     # DuckDB analytics cache
crawl = Crawl.load("./crawl.dbseospider") # Derby-backed crawl (default: DuckDB analysis)
crawl = Crawl.load("./crawl.seospider")  # Screaming Frog crawl file
crawl = Crawl.load("<uuid>", source_type="db_id")  # DB crawl ID
path
str
required
Path to the crawl source. Accepts a directory, file path, or DB crawl UUID. Auto-detection is based on path suffix and directory contents.
source_type
str
default:"auto"
Force a specific loader. One of "auto", "exports", "csv", "duckdb", "sqlite", "db", "derby", "dbseospider", "seospider", "db_id".
seospider_backend
str
default:"duckdb"
Backend to use when loading .seospider files. One of "duckdb", "derby", "csv".
db_id_backend
str
default:"duckdb"
Backend to use when loading by DB crawl ID. One of "duckdb", "derby", "csv".
dbseospider_backend
str
default:"duckdb"
Backend to use when loading .dbseospider files. One of "duckdb", "derby".
duckdb_path
str | None
default:"None"
Path for the DuckDB analytics cache. Defaults to a sibling file next to the source.
duckdb_namespace
str | None
default:"None"
Namespace to use within a multi-crawl DuckDB file.
duckdb_tabs
Sequence[str] | str | None
default:"None"
Tabs to materialize into the DuckDB cache. Pass "all" to materialize every mapped tab.
duckdb_if_exists
str
default:"auto"
Cache refresh strategy. "auto" rebuilds only when the Derby source changed. Also accepts "replace" or "skip".
materialize_dbseospider
bool
default:"True"
Whether to create a .dbseospider sidecar file when loading .seospider crawls.
csv_fallback
bool
default:"True"
Enable automatic CSV export fallback for Derby-backed crawls when a tab or column is missing.
export_tabs
Sequence[str] | None
default:"None"
Tabs to export when using CLI-backed loaders.
export_profile
str | None
default:"None"
Named export profile. Use "kitchen_sink" for the bundled full-tab profile.
Returns Crawl

Crawl.from_exports()

Load from a directory of CSV export files.
crawl = Crawl.from_exports("./exports")
export_dir
str
required
Path to the directory containing exported .csv files.
Returns Crawl

Crawl.from_database()

Load from a SQLite database file (legacy backend, limited tab support).
crawl = Crawl.from_database("./crawl.db")
db_path
str
required
Path to the SQLite .db or .sqlite file.
Returns Crawl

Crawl.from_duckdb()

Load from a DuckDB analytics cache file.
crawl = Crawl.from_duckdb("./crawl.duckdb")
crawl = Crawl.from_duckdb("./portfolio.duckdb", namespace="client-a")
db_path
str
required
Path to the .duckdb file.
namespace
str | None
default:"None"
Namespace to read within a multi-crawl DuckDB file.
Returns Crawl

Crawl.duckdb_namespaces()

List all crawl namespaces stored in a DuckDB file.
namespaces = Crawl.duckdb_namespaces("./portfolio.duckdb")
db_path
str
required
Path to the .duckdb file.
Returns list[str]

Crawl.from_derby()

Load directly from a Derby (.dbseospider) database.
crawl = Crawl.from_derby("./crawl.dbseospider")
crawl = Crawl.from_derby("./crawl.dbseospider", backend="derby", csv_fallback=False)
db_path
str
required
Path to the Derby database directory or .dbseospider archive.
backend
str
default:"duckdb"
Analysis backend. "duckdb" (default) promotes to a DuckDB analytics cache; "derby" queries Derby directly.
duckdb_path
str | None
default:"None"
Path for the DuckDB cache. Defaults to a sibling file next to the source.
duckdb_namespace
str | None
default:"None"
Namespace for the DuckDB cache.
duckdb_tables
Sequence[str] | None
default:"None"
Raw Derby tables to export into DuckDB.
duckdb_tabs
Sequence[str] | str | None
default:"None"
Mapped tabs to materialize into DuckDB. Use "all" for every available tab.
duckdb_if_exists
str
default:"auto"
Cache refresh strategy. "auto", "replace", or "skip".
csv_fallback
bool
default:"True"
Fall back to CLI CSV exports for tabs or columns unavailable in Derby.
Returns Crawl

Crawl.from_seospider()

Load from a Screaming Frog .seospider crawl file. Runs the Screaming Frog CLI internally.
crawl = Crawl.from_seospider("./crawl.seospider")
crawl = Crawl.from_seospider(
    "./crawl.seospider",
    backend="csv",
    export_dir="./exports",
    export_tabs=["Internal:All", "Response Codes:All"],
)
crawl_path
str
required
Path to the .seospider file.
backend
str
default:"duckdb"
Backend to use. One of "duckdb", "derby", "csv".
materialize_dbseospider
bool
default:"True"
Create a .dbseospider sidecar archive next to the source crawl.
dbseospider_overwrite
bool
default:"True"
Overwrite an existing .dbseospider sidecar.
ensure_db_mode
bool
default:"True"
Temporarily set storage.mode=DB in spider.config before loading.
export_tabs
Sequence[str] | None
default:"None"
Tabs to export when using the CSV backend.
export_profile
str | None
default:"None"
Named export profile (e.g. "kitchen_sink").
Returns Crawl

Crawl.from_db_id()

Load a DB-mode crawl by its UUID from the local ProjectInstanceData directory.
crawl = Crawl.from_db_id("138edb21-61d0-41cd-9e9b-725b592a471c")
crawl_id
str
required
The UUID of the DB-mode crawl folder inside ProjectInstanceData.
backend
str
default:"duckdb"
Backend to use. One of "duckdb", "derby", "csv".
project_root
str | None
default:"None"
Override the ProjectInstanceData root directory. Defaults to the standard Screaming Frog data path.
Returns Crawl

Views and queries

crawl.internal

Sitewide internal page view. Returns an InternalView object backed by the internal page model.
for page in crawl.internal.filter(status_code=404):
    print(page.address)
Type InternalView

crawl.pages()

Sitewide page view backed by the internal model. Use .filter() and .select() to narrow results.
pages = crawl.pages().filter(status_code=404).collect()
lightweight = crawl.pages().select("Address", "Status Code", "Title 1").collect()
Returns PageView
Sitewide inlinks or outlinks view.
inlinks = crawl.links("in").filter(status_code=404).collect()
outlinks = crawl.links("out").collect()
direction
str
default:"out"
Link direction. "in" for inlinks, "out" for outlinks.
Returns LinkView

crawl.tab()

Access any export tab by name (CSV filename without extension, or normalized name).
for row in crawl.tab("response_codes_all"):
    print(row["Address"], row["Status Code"])

for row in crawl.tab("page_titles").filter(gui="Missing"):
    print(row["Address"])
name
str
required
Tab name. Case-insensitive; snake_case and title-case forms accepted. Extension optional.
Returns TabView

crawl.section()

Scope page and link views to a URL path prefix or full URL prefix.
blog = crawl.section("/blog")
blog_pages = blog.pages().collect()
blog_inlinks = blog.links("in").collect()
blog_tab = blog.tab("all_inlinks").collect()
prefix
str
required
URL path prefix (e.g. "/blog") or full URL prefix (e.g. "https://example.com/blog").
Returns CrawlSection
Search across the sitewide page view.
matches = crawl.search("canonical", fields=["Address", "Title 1"]).collect()
term
str
required
Search string.
fields
Sequence[str] | None
default:"None"
Limit search to these column names. Searches all string fields when None.
case_sensitive
bool
default:"False"
Whether the search is case-sensitive.
Returns SearchRowView

crawl.tabs

List available tab names for the current backend.
print(crawl.tabs)
Type list[str]

crawl.query()

Build a chainable SQL query against a raw backend table (DB-backed crawls only).
rows = (
    crawl.query("APP", "URLS")
    .select("ENCODED_URL", "RESPONSE_CODE")
    .where("RESPONSE_CODE >= ?", 400)
    .order_by("RESPONSE_CODE DESC")
    .limit(100)
    .collect()
)
schema
str
required
Schema name (e.g. "APP").
table
str
required
Table name (e.g. "URLS").
Returns QueryView

crawl.raw()

Yield raw rows from a backend table as dicts. DB-backed crawls only.
for row in crawl.raw("APP.URLS"):
    print(row["ENCODED_URL"], row["RESPONSE_CODE"])
table
str
required
Fully qualified table name (e.g. "APP.URLS").
Returns Iterator[dict[str, Any]]

crawl.sql()

Execute a raw SQL query and yield rows as dicts. DB-backed crawls only.
for row in crawl.sql(
    "SELECT ENCODED_URL, RESPONSE_CODE FROM APP.URLS WHERE RESPONSE_CODE >= ?",
    [400],
):
    print(row)
query
str
required
SQL query string. Use ? for parameterized values.
params
Sequence[Any] | None
default:"None"
Query parameters corresponding to ? placeholders.
Returns Iterator[dict[str, Any]]

Graph helpers

Return all inlinks for a given URL.
for link in crawl.inlinks("https://example.com/page"):
    print(link.source, link.anchor_text)
url
str
required
The destination URL to look up inlinks for.
Returns Iterator[Link]
Return all outlinks from a given URL.
for link in crawl.outlinks("https://example.com/page"):
    print(link.destination)
url
str
required
The source URL to look up outlinks for.
Returns Iterator[Link]

Chain helpers

crawl.redirect_chains()

Iterate redirect chain rows, optionally filtered by hop count and loop flag.
for row in crawl.redirect_chains(min_hops=3, loop=False):
    print(row["Address"], row["Number of Redirects"])
min_hops
int | None
default:"None"
Minimum number of redirect hops. None means no lower bound.
max_hops
int | None
default:"None"
Maximum number of redirect hops. None means no upper bound.
loop
bool | None
default:"None"
Filter by loop status. True returns only loops; False excludes loops; None returns all.
Returns Iterator[dict[str, Any]]

crawl.canonical_chains()

Iterate canonical chain rows.
min_hops
int | None
default:"None"
Minimum number of canonical hops.
max_hops
int | None
default:"None"
Maximum number of canonical hops.
loop
bool | None
default:"None"
Filter by loop status.
Returns Iterator[dict[str, Any]]

crawl.redirect_and_canonical_chains()

Iterate mixed redirect and canonical chain rows.
min_hops
int | None
default:"None"
Minimum total hops.
max_hops
int | None
default:"None"
Maximum total hops.
loop
bool | None
default:"None"
Filter by loop status.
Returns Iterator[dict[str, Any]]

Audit report helpers

All report helpers return a flat list[dict[str, Any]] of issue rows, ready to export or load into a dataframe.

crawl.summary()

Return a compact crawl-level summary dict with counts for pages, broken links, orphans, redirect chains, and issue families.
print(crawl.summary())
Core counts (pages, tabs, broken_pages) are always populated. Issue-family and chain totals may be None on lean DuckDB caches until those tabs are materialized.
Returns dict[str, Any]
Return broken internal URLs with inlink counts and sampled inlink sources.
min_status
int
default:"400"
Minimum HTTP status code to include.
max_status
int
default:"599"
Maximum HTTP status code to include.
Maximum number of sampled inlink sources per broken URL. Pass None to include all.
Returns list[dict[str, Any]]
Return sitewide inlinks pointing to broken destinations.
min_status
int
default:"400"
Minimum HTTP status code.
max_status
int
default:"599"
Maximum HTTP status code.
Returns list[dict[str, Any]]
Return sitewide inlinks marked as nofollow. Returns list[dict[str, Any]]

crawl.title_meta_audit()

Return page-level rows for missing titles and missing meta descriptions. Returns list[dict[str, Any]]

crawl.indexability_audit()

Return non-indexable pages with key indexability fields (Indexability, Indexability Status, Canonical, Meta Robots, X-Robots-Tag). Returns list[dict[str, Any]]

crawl.orphan_pages_report()

Return pages with no incoming internal links.
Exclude self-referencing links when computing inlink counts.
only_indexable
bool
default:"False"
Return only indexable orphan pages.
Returns list[dict[str, Any]]

crawl.security_issues_report()

Return rows from all available security issue tabs (missing HSTS, CSP, mixed content, insecure forms, etc.). Returns list[dict[str, Any]]

crawl.canonical_issues_report()

Return rows from all available canonical issue tabs (missing, multiple, conflicting, non-indexable, etc.). Returns list[dict[str, Any]]

crawl.hreflang_issues_report()

Return rows from all available hreflang issue tabs. Returns list[dict[str, Any]]

crawl.redirect_issues_report()

Return rows from available redirect issue tabs (redirect chains, loops, meta refresh, JS redirect). Returns list[dict[str, Any]]

crawl.redirect_chain_report()

Collected version of crawl.redirect_chains(). Returns results as a list.
min_hops
int | None
default:"None"
Minimum redirect hops.
max_hops
int | None
default:"None"
Maximum redirect hops.
loop
bool | None
default:"None"
Filter by loop status.
Returns list[dict[str, Any]]

Tab metadata

crawl.tab_filters()

List available GUI filter names for a tab.
print(crawl.tab_filters("Page Titles"))
# ['Missing', 'Duplicate', 'Over 60 Characters', ...]
name
str
required
Tab name.
Returns list[str]

crawl.tab_filter_defs()

Return the full filter definition objects for a tab.
name
str
required
Tab name.
Returns list[Any]

crawl.tab_columns()

Return the column names for a tab.
print(crawl.tab_columns("page_titles"))
name
str
required
Tab name.
Returns list[str]

crawl.describe_tab()

Return a dict with tab, columns, and filters for a given tab name.
info = crawl.describe_tab("page_titles")
print(info["columns"], info["filters"])
name
str
required
Tab name.
Returns dict[str, Any]

DuckDB export

crawl.export_duckdb()

Export the current crawl into a DuckDB analytics cache file.
crawl.export_duckdb("./crawl.duckdb", if_exists="auto")

# Export into a shared portfolio file with a namespace
crawl.export_duckdb("./portfolio.duckdb", namespace="client-a", if_exists="auto")

# Materialize all mapped tabs
crawl.export_duckdb("./crawl.duckdb", tabs="all")
path
str
required
Destination path for the DuckDB file.
tables
Sequence[str] | None
default:"None"
Raw Derby tables to include.
tabs
Sequence[str] | str | None
default:"None"
Mapped tabs to materialize. Pass "all" for every available tab.
if_exists
str
default:"replace"
What to do when the cache already exists. One of "replace", "skip", "auto".
source_label
str | None
default:"None"
Label stored in the cache to identify the crawl source.
namespace
str | None
default:"None"
Namespace within the DuckDB file for multi-crawl storage.
Returns Path

export_duckdb_from_backend()

Export a crawl backend directly to a DuckDB file (lower-level than crawl.export_duckdb()). Used internally; exposed for advanced workflows.
from screamingfrog import export_duckdb_from_backend
backend
CrawlBackend
required
A crawl backend instance.
duckdb_path
str | Path
required
Destination path for the DuckDB file.
tables
Sequence[str] | None
default:"None"
Raw Derby tables to export. Defaults to DEFAULT_DUCKDB_TABLES.
tabs
Sequence[str] | str | None
default:"None"
Mapped tabs to materialize.
if_exists
str
default:"replace"
Cache refresh strategy: "replace", "skip", or "auto".
source_label
str | None
default:"None"
Label stored in the cache to identify the crawl source.
namespace
str | None
default:"None"
Namespace within the DuckDB file.
Returns Path

Exported constants

DEFAULT_DUCKDB_TABLES

The default set of raw Derby tables exported when creating a DuckDB cache without specifying tables.
from screamingfrog import DEFAULT_DUCKDB_TABLES

print(DEFAULT_DUCKDB_TABLES)
# ('APP.URLS', 'APP.LINKS', 'APP.UNIQUE_URLS')
Type tuple[str, ...]

DEFAULT_DUCKDB_TABS

The default set of mapped tabs materialized when creating a DuckDB cache without specifying tabs.
from screamingfrog import DEFAULT_DUCKDB_TABS

print(DEFAULT_DUCKDB_TABS)
# ('internal_all', 'all_inlinks', 'all_outlinks', 'redirect_chains',
#  'canonical_chains', 'redirect_and_canonical_chains')
Type tuple[str, ...]

Crawl comparison

crawl.compare()

Compare two crawls and return structural changes as a CrawlDiff.
old = Crawl.load("./crawl-2024-01.dbseospider")
new = Crawl.load("./crawl-2024-02.dbseospider")

diff = new.compare(old)
print(diff.summary())

for change in diff.status_changes:
    print(change.url, change.old_status, "->", change.new_status)
other
Crawl
required
The baseline crawl to compare against.
title_fields
Sequence[str] | None
default:"None"
Field names to use for title comparison. Defaults to ("Title 1", "Title").
redirect_fields
Sequence[str] | None
default:"None"
Field names for redirect URL comparison. Defaults to ("Redirect URL", "Redirect URI", "Redirect Destination").
redirect_type_fields
Sequence[str] | None
default:"None"
Field names for redirect type comparison. Defaults to ("Redirect Type",).
field_groups
dict[str, Sequence[str]] | None
default:"None"
Additional field groups to diff (canonical, meta description, H1-3, word count, indexability, robots directives). Pass a custom dict to override the defaults.
Returns CrawlDiff

Top-level helpers

list_crawls()

Enumerate all DB-mode crawls in the local ProjectInstanceData directory without opening Derby.
from screamingfrog import list_crawls

for info in list_crawls():
    print(info.db_id, info.url, info.urls_crawled, info.modified)

latest = list_crawls()[0]
crawl = Crawl.load(latest.db_id, source_type="db_id")
project_root
str | None
default:"None"
Override the ProjectInstanceData root directory.
Returns list[CrawlInfo]

export_duckdb_from_derby()

Export a Derby crawl to a DuckDB file directly (without creating a Crawl instance).
from screamingfrog import export_duckdb_from_derby

export_duckdb_from_derby("./crawl.dbseospider", "./crawl.duckdb", if_exists="auto")
db_path
str
required
Path to the Derby database directory or .dbseospider file.
duckdb_path
str
required
Destination path for the DuckDB file.
tables
Sequence[str] | None
default:"None"
Raw Derby tables to export.
tabs
Sequence[str] | None
default:"None"
Mapped tabs to materialize.
if_exists
str
default:"auto"
Cache refresh strategy.
Returns Path

export_duckdb_from_db_id()

Export a DB-mode crawl by ID to a DuckDB file.
from screamingfrog import export_duckdb_from_db_id

export_duckdb_from_db_id(
    "138edb21-61d0-41cd-9e9b-725b592a471c",
    "./crawl.duckdb",
    if_exists="auto",
)
db_id
str
required
The DB crawl UUID.
duckdb_path
str
required
Destination path for the DuckDB file.
tables
Sequence[str] | None
default:"None"
Raw Derby tables to export.
tabs
Sequence[str] | None
default:"None"
Mapped tabs to materialize.
if_exists
str
default:"auto"
Cache refresh strategy.
Returns Path