Quickstart - Screaming Frog API

Install the package

Install screamingfrog from PyPI. Python 3.10 or later is required.

python -m pip install screamingfrog

For .dbseospider and .seospider crawls you also need a Java runtime available. See Installation for details.

Load a crawl

Crawl.load accepts several source formats. Pass the path to your crawl file or export directory.

from screamingfrog import Crawl

# Derby .dbseospider -> auto-promotes to a sibling DuckDB cache by default
crawl = Crawl.load("./crawl.dbseospider")

Use list_crawls() to discover DB-mode crawls stored in your local Screaming Frog ProjectInstanceData directory without opening Derby or starting Java.

from screamingfrog import list_crawls

for info in list_crawls():
    print(info.db_id, info.url, info.urls_crawled, info.modified)

# Load the most recently modified crawl
latest = list_crawls()[0]
crawl = Crawl.load(latest.db_id, source_type="db_id")

list_crawls() returns CrawlInfo objects with db_id, url, urls_crawled, percent_complete, modified, and path.

Filter pages

Use crawl.pages() for a sitewide page view with ergonomic filtering, or crawl.internal for the typed internal view.

from screamingfrog import Crawl

crawl = Crawl.load("./crawl.dbseospider")

# All 404 pages
pages_404 = crawl.pages().filter(status_code=404).collect()

# Print each address
for page in crawl.internal.filter(status_code=404):
    print(page.address)

# Narrow projection: only the fields you need
lightweight = (
    crawl.pages()
    .select("Address", "Status Code", "Title 1")
    .filter(status_code=404)
    .collect()
)

collect() returns results as a list. All view objects are also iterable directly.

Access a specific tab

Use crawl.tab() to access any of the 628 mapped export surfaces by name. Tab names accept the export filename with or without the .csv extension.

from screamingfrog import Crawl

crawl = Crawl.load("./crawl.dbseospider")

# List available tabs
print(crawl.tabs)

# Iterate a tab
for row in crawl.tab("response_codes_all"):
    print(row["Address"], row["Status Code"])

# Filter by column value
for row in crawl.tab("internal_all").filter(status_code="404"):
    print(row["Address"])

# Apply a GUI filter (where supported)
for row in crawl.tab("page_titles").filter(gui="Missing"):
    print(row["Address"], row["Title 1"])

For exact GUI filter behaviour, use CSV exports (e.g. export_profile="kitchen_sink"). Derby natively supports a growing subset of GUI filters.

Run an audit

Thin audit helpers cover common SEO workflows out of the box.

from screamingfrog import Crawl

crawl = Crawl.load("./crawl.dbseospider")

# Broken internal links with inlink counts and sampled sources
broken = crawl.broken_links_report()

# Missing titles and missing meta descriptions
title_meta = crawl.title_meta_audit()

# Non-indexable pages with the fields that explain why
non_indexable = crawl.indexability_audit()

# Redirect chains with 3 or more hops
chains = crawl.redirect_chain_report(min_hops=3)

# Orphan pages (no inlinks, indexable only)
orphans = crawl.orphan_pages_report(only_indexable=True)

# Issue family reports
security_issues = crawl.security_issues_report()
canonical_issues = crawl.canonical_issues_report()
hreflang_issues = crawl.hreflang_issues_report()
redirect_issues = crawl.redirect_issues_report()

DuckDB fast path for large crawls

For large crawls or repeated analysis, export to a DuckDB cache first and load from it on subsequent runs. DuckDB is the default analysis engine for all DB-backed workflows.

from screamingfrog import Crawl

# Load from Derby source once
derby_crawl = Crawl.load(
    "./crawl.dbseospider",
    dbseospider_backend="derby",
    csv_fallback=False,
)

# Export to a DuckDB cache
derby_crawl.export_duckdb("./crawl.duckdb", if_exists="auto")

# Load from DuckDB for fast repeat queries
fast = Crawl.load("./crawl.duckdb")

pages_404 = fast.pages().filter(status_code=404).collect()
lightweight = fast.pages().select("Address", "Status Code", "Title 1").collect()
broken_inlinks = (
    fast.links("in")
    .select("Source", "Address", "Status Code")
    .filter(status_code=404)
    .collect()
)

You can also store multiple crawls in a single .duckdb file using namespaces:

from screamingfrog import Crawl

crawl_a = Crawl.load("./crawl-client-a.dbseospider")
crawl_b = Crawl.load("./crawl-client-b.dbseospider")

crawl_a.export_duckdb("./portfolio.duckdb", namespace="client-a", if_exists="auto")
crawl_b.export_duckdb("./portfolio.duckdb", namespace="client-b", if_exists="auto")

namespaces = Crawl.duckdb_namespaces("./portfolio.duckdb")
client_a = Crawl.from_duckdb("./portfolio.duckdb", namespace="client-a")

Next steps

Crawl diff

Compare two crawls with new.compare(old) to surface status, title, redirect, and canonical changes across a full site.

Generic tab access

Use crawl.tab(), crawl.tab_columns(), and crawl.describe_tab() to explore any of the 628 mapped export surfaces.

Raw SQL

Use crawl.sql() and crawl.query() for direct Derby/DuckDB access when mapped fields are not enough.

CLI wrapper

Start crawls and trigger exports programmatically with start_crawl() and export_crawl().

Documentation Index

​DuckDB fast path for large crawls

​Next steps

Crawl diff

Generic tab access

Raw SQL

CLI wrapper

DuckDB fast path for large crawls

Next steps