Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt

Use this file to discover all available pages before exploring further.

Mapped fields cover the vast majority of use cases. When you need immediate access to Derby or SQLite columns that mappings do not yet cover, three escape hatches are available.
raw(), sql(), and crawl.query() are only supported on Derby and SQLite backends. CSV and CLI export backends do not support SQL execution.

Raw table iteration

crawl.raw(table) iterates every row from a backend table as plain dicts, using the native column names from the database:
from screamingfrog import Crawl

crawl = Crawl.load("./crawl.dbseospider", csv_fallback=False)

for row in crawl.raw("APP.URLS"):
    print(row["ENCODED_URL"], row["RESPONSE_CODE"])
Raw column names vary by backend and Screaming Frog version. Use crawl.sql("SELECT * FROM APP.URLS FETCH FIRST 1 ROWS ONLY", []) to inspect available columns before building a query.

SQL passthrough

crawl.sql(query, params) passes a raw SQL string directly to the backend engine:
for row in crawl.sql(
    "SELECT ENCODED_URL, RESPONSE_CODE FROM APP.URLS WHERE RESPONSE_CODE >= ?",
    [400],
):
    print(row)
Use ? placeholders and pass parameters as a list to avoid SQL injection. Derby uses FETCH FIRST n ROWS ONLY syntax for limits:
for row in crawl.sql(
    "SELECT ENCODED_URL, RESPONSE_CODE FROM APP.URLS "
    "WHERE RESPONSE_CODE >= ? "
    "FETCH FIRST 100 ROWS ONLY",
    [400],
):
    print(row)

Query builder

crawl.query(schema, table) returns a QueryView — a chainable query builder that constructs SQL without requiring you to write full query strings.

Basic example

rows = (
    crawl.query("APP", "URLS")
    .select("ENCODED_URL", "RESPONSE_CODE", "TITLE_1")
    .where("RESPONSE_CODE >= ?", 400)
    .order_by("RESPONSE_CODE DESC", "ENCODED_URL ASC")
    .limit(100)
    .collect()
)

Available QueryView methods

.select(*columns)

Choose which columns to return. Defaults to *.

.where(clause, *params)

Add a WHERE condition. Use ? for parameterised values. Multiple .where() calls are combined with AND.

.group_by(*columns)

Add a GROUP BY clause.

.having(clause, *params)

Add a HAVING condition for aggregated results. Use ? for parameterised values.

.order_by(*clauses)

Add an ORDER BY clause. Accepts strings like "RESPONSE_CODE DESC".

.limit(n)

Limit the number of rows returned.

Collecting results

# List of dicts
rows = query.collect()

# First row only
first = query.first()

# DataFrames (requires optional dependencies)
df = query.to_pandas()
df_polars = query.to_polars()

Inspecting the generated SQL

Use .to_sql() to see the SQL string and parameter list before executing:
query = (
    crawl.query("APP", "URLS")
    .select("ENCODED_URL", "RESPONSE_CODE")
    .where("RESPONSE_CODE >= ?", 400)
    .limit(50)
)

sql, params = query.to_sql()
print(sql)
# SELECT ENCODED_URL, RESPONSE_CODE FROM APP.URLS
# WHERE (RESPONSE_CODE >= ?) FETCH FIRST 50 ROWS ONLY
print(params)
# [400]

Grouping and aggregation example

status_counts = (
    crawl.query("APP", "URLS")
    .select("RESPONSE_CODE", "COUNT(*) AS count")
    .where("RESPONSE_CODE >= ?", 400)
    .group_by("RESPONSE_CODE")
    .having("COUNT(*) > ?", 5)
    .order_by("count DESC")
    .collect()
)

Backend compatibility

MethodDerbySQLiteCSV / CLI export
crawl.raw()YesYesNo
crawl.sql()YesYesNo
crawl.query()YesYesNo
For DuckDB-backed crawls, use the high-level views (crawl.pages(), crawl.links(), crawl.tab()) instead. DuckDB does not expose raw() / sql() via the mapped interface — fall back to Derby or use direct DuckDB SQL via the standalone export_duckdb_from_derby helpers.