Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt

Use this file to discover all available pages before exploring further.

Overview

ConfigPatches builds a patch payload for the sf-config-builder Java library. Use it to modify .seospiderconfig files without manually editing XML.
from screamingfrog import ConfigPatches, CustomSearch, CustomJavaScript, write_seospider_config

patches = ConfigPatches()
patches.set("mCrawlConfig.mRenderingMode", "JAVASCRIPT")
patches.add_custom_search(CustomSearch(name="Filter 1", query=".*", data_type="REGEX"))
patches.add_custom_javascript(
    CustomJavaScript(name="Extractor 1", javascript="return document.title;")
)

patch_json = patches.to_json()

write_seospider_config(
    "base.seospiderconfig",
    "patched.seospiderconfig",
    patches,
)
write_seospider_config() requires the sf-config-builder package: pip install sf-config-builder

ConfigPatches

Mutable dataclass. All mutating methods return self for chaining.

set()

Set a scalar config value by dotted path.
patches.set("mCrawlConfig.mMaxUrls", 5000)
patches.set("mCrawlConfig.mRenderingMode", "JAVASCRIPT")
path
str
required
Dotted config path (e.g. "mCrawlConfig.mMaxUrls").
value
Any
required
The value to set.
Returns ConfigPatches

add_extraction()

Add a custom extraction rule.
patches.add_extraction(
    name="Schema Type",
    selector="//script[@type='application/ld+json']",
    selector_type="XPATH",
    extract_mode="TEXT",
)
name
str
required
Display name for the extraction.
selector
str
required
XPath or CSS selector expression.
selector_type
str
default:"XPATH"
Selector type. One of "XPATH" or "CSS_PATH".
extract_mode
str
default:"TEXT"
Extraction mode. One of "TEXT", "HTML", "ATTRIBUTE".
attribute
str | None
default:"None"
HTML attribute name. Required when extract_mode="ATTRIBUTE".
Returns ConfigPatches

remove_extraction()

Remove an existing extraction rule by name.
name
str
required
Name of the extraction to remove.
Returns ConfigPatches

clear_extractions()

Remove all custom extraction rules. Returns ConfigPatches
Add a custom search rule.
from screamingfrog import CustomSearch

patches.add_custom_search(CustomSearch(name="Filter 1", query=".*", data_type="REGEX"))
rule
CustomSearch
required
A CustomSearch instance.
Returns ConfigPatches
Remove a custom search rule by name.
name
str
required
Name of the rule to remove.
Returns ConfigPatches

clear_custom_searches()

Remove all custom search rules. Returns ConfigPatches

add_custom_javascript()

Add a custom JavaScript rule.
from screamingfrog import CustomJavaScript

patches.add_custom_javascript(
    CustomJavaScript(name="Extractor 1", javascript="return document.title;")
)
rule
CustomJavaScript
required
A CustomJavaScript instance.
Returns ConfigPatches

remove_custom_javascript()

Remove a custom JavaScript rule by name.
name
str
required
Name of the rule to remove.
Returns ConfigPatches

clear_custom_javascript()

Remove all custom JavaScript rules. Returns ConfigPatches

to_dict()dict[str, Any]

Return the complete patch payload as a Python dict.

to_json(indent=2)str

Return the complete patch payload as a JSON string.
indent
int
default:"2"
JSON indentation level.

CustomSearch

Frozen dataclass. Represents a custom search rule for ConfigPatches.
from screamingfrog import CustomSearch

rule = CustomSearch(
    name="Filter 1",
    query=".*",
    mode="REGEX",
    data_type="TEXT",
    scope="HTML",
    case_sensitive=False,
)

Fields

name
str
Display name for the custom search filter.
query
str
Search query string or regex pattern.
mode
str
default:"CONTAINS"
Match mode. Common values: "CONTAINS", "REGEX", "DOES_NOT_CONTAIN", "BEGINS_WITH", "ENDS_WITH".
data_type
str
default:"TEXT"
Data type to search. Common values: "TEXT", "REGEX".
scope
str
default:"HTML"
Search scope. Common values: "HTML", "TEXT", "URL".
case_sensitive
bool
default:"False"
Whether the search is case-sensitive.
xpath
str | None
default:"None"
Optional XPath expression to scope the search within the page.

to_op()dict[str, Any]

Return the operation dict for the ConfigBuilder patch payload.

CustomJavaScript

Frozen dataclass. Represents a custom JavaScript extraction or rendering rule.
from screamingfrog import CustomJavaScript

rule = CustomJavaScript(
    name="Extractor 1",
    javascript="return document.title;",
    type="EXTRACTION",
    timeout_secs=10,
    content_types="text/html",
)

Fields

name
str
Display name for the JavaScript rule.
javascript
str
JavaScript source code. The script should return the value to extract.
type
str
default:"EXTRACTION"
Rule type. "EXTRACTION" extracts a value; other types are spider-version-dependent.
timeout_secs
int
default:"10"
Execution timeout in seconds.
content_types
str
default:"text/html"
Comma-separated MIME types the script applies to.

to_op()dict[str, Any]

Return the operation dict for the ConfigBuilder patch payload.

write_seospider_config()

Apply a ConfigPatches object to a template .seospiderconfig file and write the result.
from screamingfrog import ConfigPatches, write_seospider_config

patches = ConfigPatches().set("mCrawlConfig.mMaxUrls", 5000)

write_seospider_config(
    "base.seospiderconfig",
    "alpha.seospiderconfig",
    patches,
)
template_config
str | Path
required
Path to the source .seospiderconfig file to use as a template.
output_config
str | Path
required
Path to write the patched .seospiderconfig file.
patches
ConfigPatches | Mapping[str, Any]
required
Patch payload. Either a ConfigPatches instance or a raw dict.
sf_path
str | None
default:"None"
Path to the sf-config-builder JAR or install directory. Required only when the library cannot be located automatically.
Returns Path — the path to the written output file.

get_export_profile()

Load a named export profile from the bundled profile lists.
from screamingfrog.config import get_export_profile

profile = get_export_profile("kitchen_sink")
print(len(profile.export_tabs), len(profile.bulk_exports))
name
str
default:"kitchen_sink"
Profile name. Currently only "kitchen_sink" is supported.
Returns ExportProfile

ExportProfile fields

export_tabs
list[str]
Ordered list of export tab names (e.g. "Internal:All", "Page Titles:Missing").
bulk_exports
list[str]
Ordered list of bulk export names.