toolregistry.hub.websearch package¶
- class toolregistry.hub.websearch.WebSearchGeneral[source]¶
Bases:
ABC
- abstract search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) list [source]¶
Perform search and return results. :param query: The search query. :type query: str :param number_results: The maximum number of results to return. Defaults to 5. :type number_results: int, optional :param threshold: Minimum score threshold for results [0-1.0]. Defaults to 0.2. :type threshold: float, optional :param timeout: Request timeout in seconds. Defaults to None. :type timeout: float, optional
- Returns:
A list of search results.
- Return type:
list
- static extract(url: str, timeout: float | None = None) str [source]¶
Extract content from a given URL using available methods.
- Parameters:
url (str) – The URL to extract content from.
timeout (float, optional) – Request timeout in seconds. Defaults to TIMEOUT_DEFAULT (10). Usually not needed.
- Returns:
Extracted content from the URL, or empty string if extraction fails.
- Return type:
str
- class toolregistry.hub.websearch.WebSearchGoogle(google_base_url: str = 'https://www.google.com', proxy: str | None = None)[source]¶
Bases:
WebSearchGeneral
WebSearchGoogle provides a unified interface for performing web searches on Google. It handles search queries and result processing.
Features: - Performs web searches using Google - Returns formatted results with title, URL and description - Supports proxy and region settings
Examples
>>> from toolregistry.hub.websearch_google import WebSearchGoogle >>> searcher = WebSearchGoogle() >>> results = searcher.search("python web scraping", number_results=3) >>> for result in results: ... print(result["title"])
- __init__(google_base_url: str = 'https://www.google.com', proxy: str | None = None)[source]¶
Initialize WebSearchGoogle with configuration parameters.
- Parameters:
google_base_url (str) – Base URL for the Google search. Defaults to “https://www.google.com”.
proxy – Optional proxy server URL (e.g. “http://proxy.example.com:8080”)
- search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) List[Dict[str, str]] [source]¶
Perform search and return results.
- Parameters:
query – The search query.
number_results – The maximum number of results to return. Default is 5.
timeout – Optional timeout override in seconds.
- Returns:
‘title’: The title of the search result
’url’: The URL of the search result
’content’: The description/content from Google
’excerpt’: Same as content (for compatibility with WebSearchSearxng)
- Return type:
List of search results, each containing
- class toolregistry.hub.websearch.WebSearchSearxng(searxng_base_url: str, proxy: str | None = None)[source]¶
Bases:
WebSearchGeneral
WebSearchSearxng provides a unified interface for performing web searches and processing results through a SearxNG instance. It handles search queries, result filtering, and content extraction.
Features: - Performs web searches using SearxNG instance - Filters results by relevance score threshold - Extracts and cleans webpage content using multiple methods (BeautifulSoup/Jina Reader) - Parallel processing of result fetching - Automatic emoji removal and text normalization
Examples
>>> from toolregistry.hub.websearch_searxng import WebSearchSearxng >>> searcher = WebSearchSearxng("http://localhost:8080") >>> results = searcher.search("python web scraping", number_results=3) >>> for result in results: ... print(result["title"])
- __init__(searxng_base_url: str, proxy: str | None = None)[source]¶
Initialize WebSearchSearxng with configuration parameters. :param searxng_base_url: Base URL for the SearxNG instance (e.g. “http://localhost:8080”). :type searxng_base_url: str :param proxy: Proxy URL for HTTP requests. :type proxy: Optional[str]
- search(query: str, number_results: int = 5, threshold: float = 0.2, timeout: float | None = None) List[Dict[str, str]] [source]¶
Perform search and return results.
- Parameters:
query (str) – The search query. Boolean operators like AND, OR, NOT can be used if needed.
number_results (int, optional) – The maximum number of results to return. Defaults to 5.
threshold (float, optional) – Minimum score threshold for results [0-1.0]. Defaults to 0.2.
timeout (float, optional) – Request timeout in seconds. Defaults to TIMEOUT_DEFAULT (10). Usually not needed.
- Returns:
A list of enriched search results. Each dictionary contains: - ‘title’: The title of the search result. - ‘url’: The URL of the search result. - ‘content’: The content of the search result. - ‘excerpt’: The excerpt of the search result.
- Return type:
List[Dict[str, str]]