Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt
Use this file to discover all available pages before exploring further.
InternalView
Returned bycrawl.internal. Backed by the internal page model and yields InternalPage objects.
Derby-backed
crawl.internal materializes computed mapped fields such as Indexability and Indexability Status. DuckDB-backed crawls read these from the cached internal relation.Methods
.filter(**kwargs) → InternalView
Narrow results by column value. Keys are column names or snake_case equivalents.
Column name / value pairs. Values are matched by equality.
.search(term, *, fields, case_sensitive) → SearchInternalView
Search string fields across internal pages.
Search string.
Column names to search. Searches all string fields when
None.Case-sensitive matching.
.count() → int
Return the number of matching pages.
.collect() → list[InternalPage]
Materialize all matching pages into a list.
.first() → InternalPage | None
Return the first matching page, or None if the view is empty.
.to_pandas() / .to_polars()
Return a pandas or Polars DataFrame. Requires the respective library to be installed.
TabView
Returned bycrawl.tab(name). Yields rows as dict[str, Any].
Methods
.filter(**kwargs) → TabView
Filter by column value. Supports a special gui="..." keyword for applying named GUI filters.
Named GUI filter to apply (e.g.
"Missing", "Duplicate"). Use crawl.tab_filters(name) to list available filter names for a tab.Apply multiple GUI filters at once.
Additional column name / value pairs for equality filtering.
.search(term, *, fields, case_sensitive) → SearchRowView
Search string.
Column names to search.
Case-sensitive matching.
.count() → int
Return the total number of matching rows.
.collect() → list[dict[str, Any]]
Materialize all matching rows into a list.
.first() → dict[str, Any] | None
Return the first matching row, or None.
.to_pandas() / .to_polars()
Return a pandas or Polars DataFrame.
PageView
Returned bycrawl.pages(). Backed by the internal page model and yields rows as dict[str, Any]. Use .select() to project a narrow field subset.
Methods
.filter(**kwargs) → PageView
Narrow pages by column value.
Column name / value pairs.
.select(*fields) → ProjectedPageView
Project a subset of fields. Avoids materializing the full internal page model when only a few columns are needed.
One or more field names to include. At least one field is required.
.search(term, *, fields, case_sensitive) → SearchRowView
Search string.
Column names to search.
Case-sensitive matching.
.count() → int
Return the number of matching pages.
.collect() → list[dict[str, Any]]
Materialize all matching rows.
.first() → dict[str, Any] | None
Return the first matching row.
.to_pandas() / .to_polars()
Return a pandas or Polars DataFrame.
ProjectedPageView
Returned bycrawl.pages().select(...). Behaves like PageView but only returns the selected fields. DuckDB-backed crawls use a narrow helper relation to avoid full internal_all materialization.
PageView: .filter(), .search(), .count(), .collect(), .first(), .to_pandas(), .to_polars().
LinkView
Returned bycrawl.links(direction). Yields link rows as dict[str, Any].
Methods
.filter(**kwargs) → LinkView
Column name / value pairs for equality filtering.
.select(*fields) → ProjectedLinkView
Project a field subset. Avoids materializing wide inlink/outlink tabs on lean DuckDB caches.
One or more field names. At least one is required.
.search(term, *, fields, case_sensitive) → SearchRowView
Search string.
Column names to search.
Case-sensitive matching.
.count() → int
.collect() → list[dict[str, Any]]
.first() → dict[str, Any] | None
.to_pandas() / .to_polars()
ProjectedLinkView
Returned bycrawl.links(...).select(...). Behaves like LinkView but only returns selected fields.
Supports the same methods as LinkView: .filter(), .search(), .count(), .collect(), .first(), .to_pandas(), .to_polars().
CrawlSection
Returned bycrawl.section(prefix). Scopes page, link, and tab views to a URL prefix.
Methods
.pages() → ScopedRowView
Return a scoped page view matched by Address, URL Encoded Address, or Encoded URL.
.links(direction="out") → ScopedRowView
Return a scoped link view. For inlinks, scope matches on Address, Destination, or To. For outlinks, scope matches on Source, From, or Address.
"in" or "out"..tab(name, fields=None) → ScopedRowView
Return a scoped view of any tab. By default, scope matches against common URL-bearing columns (Address, Source, Destination, URL, From, To, etc.).
Tab name.
Override the URL fields used for prefix matching.
QueryView
Returned bycrawl.query(schema, table). Provides a chainable SQL builder against raw backend tables. DB-backed crawls only.
Methods
.select(*columns) → QueryView
Set the columns to select. Defaults to *.
One or more column names or SQL expressions.
.where(sql_fragment, *params) → QueryView
Add a WHERE clause. Multiple calls are AND-combined.
SQL fragment. Use
? for parameterized values.Positional parameters for
? placeholders..group_by(*columns) → QueryView
Add a GROUP BY clause.
One or more column names.
.having(sql_fragment, *params) → QueryView
Add a HAVING clause. Multiple calls are AND-combined.
SQL fragment.
Positional parameters.
.order_by(*clauses) → QueryView
Add an ORDER BY clause.
Column names or
COLUMN ASC|DESC expressions..limit(n) → QueryView
Limit the number of rows returned. Pass None to remove an existing limit.
Maximum number of rows. Must be a positive integer when not
None..to_sql() → tuple[str, list[Any]]
Return the generated SQL string and parameter list without executing the query.
.collect() → list[dict[str, Any]]
Execute the query and return all rows.
.first() → dict[str, Any] | None
Execute the query with LIMIT 1 and return the first row.
.to_pandas() / .to_polars()
Return a pandas or Polars DataFrame.
Data models
InternalPage
Represents a single row from the Internal tab. Returned by InternalView.
The page URL.
HTTP status code.
Internal row ID (available on DB-backed crawls).
All raw fields from the internal tab, keyed by column name.
InternalPage.from_csv_row(row: Mapping[str, Any])→InternalPageInternalPage.from_db_row(columns: list[str], values: tuple[Any, ...])→InternalPageInternalPage.from_data(data: Mapping[str, Any], *, copy_data: bool = True)→InternalPage
Link
Represents a single inlink or outlink row. Returned by crawl.inlinks() and crawl.outlinks().
The source URL of the link.
The destination URL of the link.
The link anchor text.
All raw fields from the link row.
Link.from_row(row: Mapping[str, Any])→Link
CrawlInfo
Metadata for a DB-mode crawl in the ProjectInstanceData directory. Returned by list_crawls().
The crawl UUID folder name.
The crawl start URL.
Number of crawled URLs.
Crawl completion percentage.
Last modified timestamp (UTC).
Absolute path to the crawl folder.