Views - Screaming Frog API

InternalView

Returned by crawl.internal. Backed by the internal page model and yields InternalPage objects.

for page in crawl.internal.filter(status_code=404):
    print(page.address, page.status_code)

pages = crawl.internal.filter(status_code=200).collect()
count = crawl.internal.count()

Derby-backed crawl.internal materializes computed mapped fields such as Indexability and Indexability Status. DuckDB-backed crawls read these from the cached internal relation.

Methods

`.filter(**kwargs)` → `InternalView`

Narrow results by column value. Keys are column names or snake_case equivalents.

crawl.internal.filter(status_code=404)
crawl.internal.filter(indexability="Non-Indexable")

**kwargs

Any

Column name / value pairs. Values are matched by equality.

`.search(term, *, fields, case_sensitive)` → `SearchInternalView`

Search string fields across internal pages.

term

str

required

Search string.

fields

Sequence[str] | None

default:"None"

Column names to search. Searches all string fields when None.

case_sensitive

bool

default:"False"

Case-sensitive matching.

`.count()` → `int`

Return the number of matching pages.

`.collect()` → `list[InternalPage]`

Materialize all matching pages into a list.

`.first()` → `InternalPage | None`

Return the first matching page, or None if the view is empty.

`.to_pandas()` / `.to_polars()`

Return a pandas or Polars DataFrame. Requires the respective library to be installed.

TabView

Returned by crawl.tab(name). Yields rows as dict[str, Any].

for row in crawl.tab("response_codes_all"):
    print(row["Address"], row["Status Code"])

# GUI filter shortcut
for row in crawl.tab("page_titles").filter(gui="Missing"):
    print(row["Address"])

Methods

`.filter(**kwargs)` → `TabView`

Filter by column value. Supports a special gui="..." keyword for applying named GUI filters.

gui

str

Named GUI filter to apply (e.g. "Missing", "Duplicate"). Use crawl.tab_filters(name) to list available filter names for a tab.

gui_filters

list[str]

Apply multiple GUI filters at once.

**kwargs

Any

Additional column name / value pairs for equality filtering.

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

term

str

required

Search string.

fields

Sequence[str] | None

default:"None"

Column names to search.

case_sensitive

bool

default:"False"

Case-sensitive matching.

`.count()` → `int`

Return the total number of matching rows.

`.collect()` → `list[dict[str, Any]]`

Materialize all matching rows into a list.

`.first()` → `dict[str, Any] | None`

Return the first matching row, or None.

`.to_pandas()` / `.to_polars()`

Return a pandas or Polars DataFrame.

PageView

Returned by crawl.pages(). Backed by the internal page model and yields rows as dict[str, Any]. Use .select() to project a narrow field subset.

pages = crawl.pages().filter(status_code=404).collect()

Methods

`.filter(**kwargs)` → `PageView`

Narrow pages by column value.

**kwargs

Any

Column name / value pairs.

`.select(*fields)` → `ProjectedPageView`

Project a subset of fields. Avoids materializing the full internal page model when only a few columns are needed.

lightweight = crawl.pages().select("Address", "Status Code", "Title 1").collect()

*fields

str

required

One or more field names to include. At least one field is required.

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

term

str

required

Search string.

fields

Sequence[str] | None

default:"None"

Column names to search.

case_sensitive

bool

default:"False"

Case-sensitive matching.

`.count()` → `int`

Return the number of matching pages.

`.collect()` → `list[dict[str, Any]]`

Materialize all matching rows.

`.first()` → `dict[str, Any] | None`

Return the first matching row.

`.to_pandas()` / `.to_polars()`

Return a pandas or Polars DataFrame.

ProjectedPageView

Returned by crawl.pages().select(...). Behaves like PageView but only returns the selected fields. DuckDB-backed crawls use a narrow helper relation to avoid full internal_all materialization.

result = crawl.pages().select("Address", "Status Code", "Title 1").filter(status_code=404).collect()

Supports the same methods as PageView: .filter(), .search(), .count(), .collect(), .first(), .to_pandas(), .to_polars().

LinkView

Returned by crawl.links(direction). Yields link rows as dict[str, Any].

inlinks = crawl.links("in").filter(status_code=404).collect()
nofollow = crawl.links("in").search("nofollow", fields=["Follow"]).collect()

Methods

`.filter(**kwargs)` → `LinkView`

**kwargs

Any

Column name / value pairs for equality filtering.

`.select(*fields)` → `ProjectedLinkView`

Project a field subset. Avoids materializing wide inlink/outlink tabs on lean DuckDB caches.

broken_inlinks = crawl.links("in").select("Source", "Address", "Status Code").filter(status_code=404).collect()

*fields

str

required

One or more field names. At least one is required.

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

term

str

required

Search string.

fields

Sequence[str] | None

default:"None"

Column names to search.

case_sensitive

bool

default:"False"

Case-sensitive matching.

`.count()` → `int`

`.collect()` → `list[dict[str, Any]]`

`.first()` → `dict[str, Any] | None`

`.to_pandas()` / `.to_polars()`

ProjectedLinkView

Returned by crawl.links(...).select(...). Behaves like LinkView but only returns selected fields. Supports the same methods as LinkView: .filter(), .search(), .count(), .collect(), .first(), .to_pandas(), .to_polars().

CrawlSection

Returned by crawl.section(prefix). Scopes page, link, and tab views to a URL prefix.

blog = crawl.section("/blog")
blog_pages = blog.pages().collect()
blog_outlinks = blog.links("out").collect()
blog_inlinks_tab = blog.tab("all_inlinks").collect()

Methods

`.pages()` → `ScopedRowView`

Return a scoped page view matched by Address, URL Encoded Address, or Encoded URL.

`.links(direction="out")` → `ScopedRowView`

Return a scoped link view. For inlinks, scope matches on Address, Destination, or To. For outlinks, scope matches on Source, From, or Address.

direction

str

default:"out"

"in" or "out".

`.tab(name, fields=None)` → `ScopedRowView`

Return a scoped view of any tab. By default, scope matches against common URL-bearing columns (Address, Source, Destination, URL, From, To, etc.).

name

str

required

Tab name.

fields

Sequence[str] | None

default:"None"

Override the URL fields used for prefix matching.

QueryView

Returned by crawl.query(schema, table). Provides a chainable SQL builder against raw backend tables. DB-backed crawls only.

rows = (
    crawl.query("APP", "URLS")
    .select("ENCODED_URL", "RESPONSE_CODE", "TITLE_1")
    .where("RESPONSE_CODE >= ?", 400)
    .order_by("RESPONSE_CODE DESC", "ENCODED_URL ASC")
    .limit(100)
    .collect()
)

Methods

`.select(*columns)` → `QueryView`

Set the columns to select. Defaults to *.

*columns

str

required

One or more column names or SQL expressions.

`.where(sql_fragment, *params)` → `QueryView`

Add a WHERE clause. Multiple calls are AND-combined.

sql_fragment

str

required

SQL fragment. Use ? for parameterized values.

*params

Any

Positional parameters for ? placeholders.

`.group_by(*columns)` → `QueryView`

Add a GROUP BY clause.

*columns

str

required

One or more column names.

`.having(sql_fragment, *params)` → `QueryView`

Add a HAVING clause. Multiple calls are AND-combined.

sql_fragment

str

required

SQL fragment.

*params

Any

Positional parameters.

`.order_by(*clauses)` → `QueryView`

Add an ORDER BY clause.

*clauses

str

required

Column names or COLUMN ASC|DESC expressions.

`.limit(n)` → `QueryView`

Limit the number of rows returned. Pass None to remove an existing limit.

int | None

required

Maximum number of rows. Must be a positive integer when not None.

`.to_sql()` → `tuple[str, list[Any]]`

Return the generated SQL string and parameter list without executing the query.

sql, params = crawl.query("APP", "URLS").where("RESPONSE_CODE = ?", 404).to_sql()
print(sql)

`.collect()` → `list[dict[str, Any]]`

Execute the query and return all rows.

`.first()` → `dict[str, Any] | None`

Execute the query with LIMIT 1 and return the first row.

`.to_pandas()` / `.to_polars()`

Return a pandas or Polars DataFrame.

Data models

`InternalPage`

Represents a single row from the Internal tab. Returned by InternalView.

address

str

The page URL.

status_code

int | None

HTTP status code.

int | None

Internal row ID (available on DB-backed crawls).

data

dict[str, Any]

All raw fields from the internal tab, keyed by column name.

Class methods:

InternalPage.from_csv_row(row: Mapping[str, Any]) → InternalPage
InternalPage.from_db_row(columns: list[str], values: tuple[Any, ...]) → InternalPage
InternalPage.from_data(data: Mapping[str, Any], *, copy_data: bool = True) → InternalPage

`Link`

Represents a single inlink or outlink row. Returned by crawl.inlinks() and crawl.outlinks().

source

str | None

The source URL of the link.

destination

str | None

The destination URL of the link.

anchor_text

str | None

The link anchor text.

data

dict[str, Any]

All raw fields from the link row.

Class method:

Link.from_row(row: Mapping[str, Any]) → Link

`CrawlInfo`

Metadata for a DB-mode crawl in the ProjectInstanceData directory. Returned by list_crawls().

db_id

str

The crawl UUID folder name.

url

str

The crawl start URL.

urls_crawled

int

Number of crawled URLs.

percent_complete

float

Crawl completion percentage.

modified

datetime

Last modified timestamp (UTC).

path

Path

Absolute path to the crawl folder.

Documentation Index

​InternalView

​Methods

​.filter(**kwargs) → InternalView

​.search(term, *, fields, case_sensitive) → SearchInternalView

​.count() → int

​.collect() → list[InternalPage]

​.first() → InternalPage | None

​.to_pandas() / .to_polars()

​TabView

​Methods

​.filter(**kwargs) → TabView

​.search(term, *, fields, case_sensitive) → SearchRowView

​.count() → int

​.collect() → list[dict[str, Any]]

​.first() → dict[str, Any] | None

​.to_pandas() / .to_polars()

​PageView

​Methods

​.filter(**kwargs) → PageView

​.select(*fields) → ProjectedPageView

​.search(term, *, fields, case_sensitive) → SearchRowView

​.count() → int

​.collect() → list[dict[str, Any]]

​.first() → dict[str, Any] | None

​.to_pandas() / .to_polars()

​ProjectedPageView

​LinkView

​Methods

​.filter(**kwargs) → LinkView

​.select(*fields) → ProjectedLinkView

​.search(term, *, fields, case_sensitive) → SearchRowView

​.count() → int

​.collect() → list[dict[str, Any]]

​.first() → dict[str, Any] | None

​.to_pandas() / .to_polars()

​ProjectedLinkView

​CrawlSection

​Methods

​.pages() → ScopedRowView

​.links(direction="out") → ScopedRowView

​.tab(name, fields=None) → ScopedRowView

​QueryView

​Methods

​.select(*columns) → QueryView

​.where(sql_fragment, *params) → QueryView

​.group_by(*columns) → QueryView

​.having(sql_fragment, *params) → QueryView

​.order_by(*clauses) → QueryView

​.limit(n) → QueryView

​.to_sql() → tuple[str, list[Any]]

​.collect() → list[dict[str, Any]]

​.first() → dict[str, Any] | None

​.to_pandas() / .to_polars()

​Data models

​InternalPage

​Link

​CrawlInfo

InternalView

Methods

`.filter(**kwargs)` → `InternalView`

`.search(term, *, fields, case_sensitive)` → `SearchInternalView`

`.count()` → `int`

`.collect()` → `list[InternalPage]`

`.first()` → `InternalPage | None`

`.to_pandas()` / `.to_polars()`

TabView

Methods

`.filter(**kwargs)` → `TabView`

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

`.count()` → `int`

`.collect()` → `list[dict[str, Any]]`

`.first()` → `dict[str, Any] | None`

`.to_pandas()` / `.to_polars()`

PageView

Methods

`.filter(**kwargs)` → `PageView`

`.select(*fields)` → `ProjectedPageView`

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

`.count()` → `int`

`.collect()` → `list[dict[str, Any]]`

`.first()` → `dict[str, Any] | None`

`.to_pandas()` / `.to_polars()`

ProjectedPageView

LinkView

Methods

`.filter(**kwargs)` → `LinkView`

`.select(*fields)` → `ProjectedLinkView`

`.search(term, *, fields, case_sensitive)` → `SearchRowView`

`.count()` → `int`

`.collect()` → `list[dict[str, Any]]`

`.first()` → `dict[str, Any] | None`

`.to_pandas()` / `.to_polars()`

ProjectedLinkView

CrawlSection

Methods

`.pages()` → `ScopedRowView`

`.links(direction="out")` → `ScopedRowView`

`.tab(name, fields=None)` → `ScopedRowView`

QueryView

Methods

`.select(*columns)` → `QueryView`

`.where(sql_fragment, *params)` → `QueryView`

`.group_by(*columns)` → `QueryView`

`.having(sql_fragment, *params)` → `QueryView`

`.order_by(*clauses)` → `QueryView`

`.limit(n)` → `QueryView`

`.to_sql()` → `tuple[str, list[Any]]`

`.collect()` → `list[dict[str, Any]]`

`.first()` → `dict[str, Any] | None`

`.to_pandas()` / `.to_polars()`

Data models

`InternalPage`

`Link`

`CrawlInfo`