Config patches - Screaming Frog API

Overview

ConfigPatches builds a patch payload for the sf-config-builder Java library. Use it to modify .seospiderconfig files without manually editing XML.

from screamingfrog import ConfigPatches, CustomSearch, CustomJavaScript, write_seospider_config

patches = ConfigPatches()
patches.set("mCrawlConfig.mRenderingMode", "JAVASCRIPT")
patches.add_custom_search(CustomSearch(name="Filter 1", query=".*", data_type="REGEX"))
patches.add_custom_javascript(
    CustomJavaScript(name="Extractor 1", javascript="return document.title;")
)

patch_json = patches.to_json()

write_seospider_config(
    "base.seospiderconfig",
    "patched.seospiderconfig",
    patches,
)

write_seospider_config() requires the sf-config-builder package: pip install sf-config-builder

ConfigPatches

Mutable dataclass. All mutating methods return self for chaining.

`set()`

Set a scalar config value by dotted path.

patches.set("mCrawlConfig.mMaxUrls", 5000)
patches.set("mCrawlConfig.mRenderingMode", "JAVASCRIPT")

path

str

required

Dotted config path (e.g. "mCrawlConfig.mMaxUrls").

value

Any

required

The value to set.

Returns ConfigPatches

`add_extraction()`

Add a custom extraction rule.

patches.add_extraction(
    name="Schema Type",
    selector="//script[@type='application/ld+json']",
    selector_type="XPATH",
    extract_mode="TEXT",
)

name

str

required

Display name for the extraction.

selector

str

required

XPath or CSS selector expression.

selector_type

str

default:"XPATH"

Selector type. One of "XPATH" or "CSS_PATH".

extract_mode

str

default:"TEXT"

Extraction mode. One of "TEXT", "HTML", "ATTRIBUTE".

attribute

str | None

default:"None"

HTML attribute name. Required when extract_mode="ATTRIBUTE".

Returns ConfigPatches

`remove_extraction()`

Remove an existing extraction rule by name.

name

str

required

Name of the extraction to remove.

Returns ConfigPatches

`clear_extractions()`

Remove all custom extraction rules. Returns ConfigPatches

`add_custom_search()`

Add a custom search rule.

from screamingfrog import CustomSearch

patches.add_custom_search(CustomSearch(name="Filter 1", query=".*", data_type="REGEX"))

rule

CustomSearch

required

A CustomSearch instance.

Returns ConfigPatches

`remove_custom_search()`

Remove a custom search rule by name.

name

str

required

Name of the rule to remove.

Returns ConfigPatches

`clear_custom_searches()`

Remove all custom search rules. Returns ConfigPatches

`add_custom_javascript()`

Add a custom JavaScript rule.

from screamingfrog import CustomJavaScript

patches.add_custom_javascript(
    CustomJavaScript(name="Extractor 1", javascript="return document.title;")
)

rule

CustomJavaScript

required

A CustomJavaScript instance.

Returns ConfigPatches

`remove_custom_javascript()`

Remove a custom JavaScript rule by name.

name

str

required

Name of the rule to remove.

Returns ConfigPatches

`clear_custom_javascript()`

Remove all custom JavaScript rules. Returns ConfigPatches

`to_dict()` → `dict[str, Any]`

Return the complete patch payload as a Python dict.

`to_json(indent=2)` → `str`

Return the complete patch payload as a JSON string.

indent

int

default:"2"

JSON indentation level.

CustomSearch

Frozen dataclass. Represents a custom search rule for ConfigPatches.

from screamingfrog import CustomSearch

rule = CustomSearch(
    name="Filter 1",
    query=".*",
    mode="REGEX",
    data_type="TEXT",
    scope="HTML",
    case_sensitive=False,
)

Fields

name

str

Display name for the custom search filter.

query

str

Search query string or regex pattern.

mode

str

default:"CONTAINS"

Match mode. Common values: "CONTAINS", "REGEX", "DOES_NOT_CONTAIN", "BEGINS_WITH", "ENDS_WITH".

data_type

str

default:"TEXT"

Data type to search. Common values: "TEXT", "REGEX".

scope

str

default:"HTML"

Search scope. Common values: "HTML", "TEXT", "URL".

case_sensitive

bool

default:"False"

Whether the search is case-sensitive.

xpath

str | None

default:"None"

Optional XPath expression to scope the search within the page.

`to_op()` → `dict[str, Any]`

Return the operation dict for the ConfigBuilder patch payload.

CustomJavaScript

Frozen dataclass. Represents a custom JavaScript extraction or rendering rule.

from screamingfrog import CustomJavaScript

rule = CustomJavaScript(
    name="Extractor 1",
    javascript="return document.title;",
    type="EXTRACTION",
    timeout_secs=10,
    content_types="text/html",
)

Fields

name

str

Display name for the JavaScript rule.

javascript

str

JavaScript source code. The script should return the value to extract.

type

str

default:"EXTRACTION"

Rule type. "EXTRACTION" extracts a value; other types are spider-version-dependent.

timeout_secs

int

default:"10"

Execution timeout in seconds.

content_types

str

default:"text/html"

Comma-separated MIME types the script applies to.

`to_op()` → `dict[str, Any]`

Return the operation dict for the ConfigBuilder patch payload.

`write_seospider_config()`

Apply a ConfigPatches object to a template .seospiderconfig file and write the result.

from screamingfrog import ConfigPatches, write_seospider_config

patches = ConfigPatches().set("mCrawlConfig.mMaxUrls", 5000)

write_seospider_config(
    "base.seospiderconfig",
    "alpha.seospiderconfig",
    patches,
)

template_config

str | Path

required

Path to the source .seospiderconfig file to use as a template.

output_config

str | Path

required

Path to write the patched .seospiderconfig file.

patches

ConfigPatches | Mapping[str, Any]

required

Patch payload. Either a ConfigPatches instance or a raw dict.

sf_path

str | None

default:"None"

Path to the sf-config-builder JAR or install directory. Required only when the library cannot be located automatically.

Returns Path — the path to the written output file.

`get_export_profile()`

Load a named export profile from the bundled profile lists.

from screamingfrog.config import get_export_profile

profile = get_export_profile("kitchen_sink")
print(len(profile.export_tabs), len(profile.bulk_exports))

name

str

default:"kitchen_sink"

Profile name. Currently only "kitchen_sink" is supported.

Returns ExportProfile

ExportProfile fields

export_tabs

list[str]

Ordered list of export tab names (e.g. "Internal:All", "Page Titles:Missing").

bulk_exports

list[str]

Ordered list of bulk export names.

Documentation Index

​Overview

​ConfigPatches

​set()

​add_extraction()

​remove_extraction()

​clear_extractions()

​add_custom_search()

​remove_custom_search()

​clear_custom_searches()

​add_custom_javascript()

​remove_custom_javascript()

​clear_custom_javascript()

​to_dict() → dict[str, Any]

​to_json(indent=2) → str

​CustomSearch

​Fields

​to_op() → dict[str, Any]

​CustomJavaScript

​Fields

​to_op() → dict[str, Any]

​write_seospider_config()

​get_export_profile()

​ExportProfile fields

Overview

ConfigPatches

`set()`

`add_extraction()`

`remove_extraction()`

`clear_extractions()`

`add_custom_search()`

`remove_custom_search()`

`clear_custom_searches()`

`add_custom_javascript()`

`remove_custom_javascript()`

`clear_custom_javascript()`

`to_dict()` → `dict[str, Any]`

`to_json(indent=2)` → `str`

CustomSearch

Fields

`to_op()` → `dict[str, Any]`

CustomJavaScript

Fields

`to_op()` → `dict[str, Any]`

`write_seospider_config()`

`get_export_profile()`

ExportProfile fields