toolregistry.hub.websearch package

class toolregistry.hub.websearch.WebSearchGeneral[source]

Bases: ABC

abstract search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) list[source]

Perform search and return results. :param query: The search query. :type query: str :param number_results: The maximum number of results to return. Defaults to 5. :type number_results: int, optional :param threshold: Minimum score threshold for results [0-1.0]. Defaults to 0.2. :type threshold: float, optional :param timeout: Request timeout in seconds. Defaults to None. :type timeout: float, optional

Returns:

A list of search results.

Return type:

list

static extract(url: str, timeout: float | None = None) str[source]

Extract content from a given URL using available methods.

Parameters:
  • url (str) – The URL to extract content from.

  • timeout (float, optional) – Request timeout in seconds. Defaults to TIMEOUT_DEFAULT (10). Usually not needed.

Returns:

Extracted content from the URL, or empty string if extraction fails.

Return type:

str

class toolregistry.hub.websearch.WebSearchGoogle(google_base_url: str = 'https://www.google.com', proxy: str | None = None)[source]

Bases: WebSearchGeneral

WebSearchGoogle provides a unified interface for performing web searches on Google. It handles search queries and result processing.

Features: - Performs web searches using Google - Returns formatted results with title, URL and description - Supports proxy and region settings

Examples

>>> from toolregistry.hub.websearch_google import WebSearchGoogle
>>> searcher = WebSearchGoogle()
>>> results = searcher.search("python web scraping", number_results=3)
>>> for result in results:
...     print(result["title"])
__init__(google_base_url: str = 'https://www.google.com', proxy: str | None = None)[source]

Initialize WebSearchGoogle with configuration parameters.

Parameters:
search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) List[Dict[str, str]][source]

Perform search and return results.

Parameters:
  • query – The search query.

  • number_results – The maximum number of results to return. Default is 5.

  • timeout – Optional timeout override in seconds.

Returns:

  • ‘title’: The title of the search result

  • ’url’: The URL of the search result

  • ’content’: The description/content from Google

  • ’excerpt’: Same as content (for compatibility with WebSearchSearxng)

Return type:

List of search results, each containing

class toolregistry.hub.websearch.WebSearchSearxng(searxng_base_url: str, proxy: str | None = None)[source]

Bases: WebSearchGeneral

WebSearchSearxng provides a unified interface for performing web searches and processing results through a SearxNG instance. It handles search queries, result filtering, and content extraction.

Features: - Performs web searches using SearxNG instance - Filters results by relevance score threshold - Extracts and cleans webpage content using multiple methods (BeautifulSoup/Jina Reader) - Parallel processing of result fetching - Automatic emoji removal and text normalization

Examples

>>> from toolregistry.hub.websearch_searxng import WebSearchSearxng
>>> searcher = WebSearchSearxng("http://localhost:8080")
>>> results = searcher.search("python web scraping", number_results=3)
>>> for result in results:
...     print(result["title"])
__init__(searxng_base_url: str, proxy: str | None = None)[source]

Initialize WebSearchSearxng with configuration parameters. :param searxng_base_url: Base URL for the SearxNG instance (e.g. “http://localhost:8080”). :type searxng_base_url: str :param proxy: Proxy URL for HTTP requests. :type proxy: Optional[str]

search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) List[Dict[str, str]][source]

Perform search and return results.

Parameters:
  • query (str) – The search query. Boolean operators like AND, OR, NOT can be used if needed.

  • number_results (int, optional) – The maximum number of results to return. Defaults to 5.

  • threshold (float, optional) – Minimum score threshold for results [0-1.0]. Defaults to 0.2.

  • timeout (float, optional) – Request timeout in seconds. Defaults to TIMEOUT_DEFAULT (10). Usually not needed.

Returns:

A list of enriched search results. Each dictionary contains: - ‘title’: The title of the search result. - ‘url’: The URL of the search result. - ‘content’: The content of the search result. - ‘excerpt’: The excerpt of the search result.

Return type:

List[Dict[str, str]]

Submodules