An agent can leverage the content of specific websites by retrieving it using a web crawler to create a Website data source.
Adding a Website data source
To add a new website data source,
Go to Build > Data sources > External integrations.
Click the Add integrations button;
Next, click the Website tile and provide the following information:
Name your data source;
Add a description;
Add the URL of the website to crawl, e.g.
https://www.konverso.ai/
;Specify the Language of the website content as a 2-character code, e.g.
en
for English;Add a URL filter: Type the URLs that you want to crawl. Indeed, if you do not specify the links, all the pages from external links will be crawled. For example, if you only want to crawl
konverso.ai
and not the external links included, you need to add it to this filter;(Optional) XPath of the site title: specify the XPath used to extract the titles of the pages containing articles. If not defined, the default CSS title value is used;
Set the Maximum number of stored pages. If set to 0 (default), all pages will be crawled until there are no more URLs to crawl;
Specify the list of url regexes to include in crawl. They will be used to filter URLs that will be stored;
Specify the List of url regexes to exclude from crawl. They will be used to filter URLs that will be not stored. Note that these regexes are checked if there is a regex that matches a page that should be stored. Note that the regex should match the whole url.
(Optional) Download file URLs: Check this box to download file URLs.
Specify the Downloadable file extensions. This allows you to crawl attached files, such as PDF files. In that case, you can specify the file extensions to download, e.g.
.pdf
,pdf
, etc.Select who to share the credentials with. This will determine who can use your credentials in the data sources. If you want to keep these credentials private, select Only me. If you want to share them with other builders, select Builders.
Choose whether to limit pages to Only child pages or the Entire website.
Finally, click the Add external integration button to add this website to your data source repository.
What’s next?
Now that the data source has been created, you can select it when creating an agent.
Find more information about how to create an agent by reading this page: Build your own agent.