...
Go to Build > Data sources > External integrations.
Click the Add integrations button;
Next, click the Website tile and provide the following information:
Name your data source;
Add a description;
Add the URL of the website to crawl, e.g.
https://www.konverso.ai/
;Specify the Language of the website ;Choose who to share this data source with: Only me or Builders;content as a 2-character code, e.g.
en
for English;Add a URL filter: Type the URLs that you want to crawl. Indeed, if you do not specify the links, all the pages from external links will be crawled. For example, if you only want to crawl
konverso.ai
and not the external links included, you need to add it to this filter;(Optional) XPath of the site title: specify the XPath used to extract the titles of the pages containing articles. If not defined, the default CSS title value is used;
Set the Maximum number of stored pages. If set to 0 (default), all pages will be crawled until there are no more URLs to crawl;
Specify the list of url regexes to include in crawl. They will be used to filter URLs that will be stored;
Specify the List of url regexes to exclude from crawl. They will be used to filter URLs that will be not stored. Note that these regexes are checked if there is a regex that matches a page that should be stored. Note that the regex should match the whole url.
(Optional) Download file URLs: Check this box to download file URLs.
Specify the Downloadable file extensions. This allows you to crawl attached files, such as PDF files. In that case, you can specify the file extensions to download, e.g.
.pdf
,pdf
, etc.Select who to share the credentials with. This will determine who can use your credentials in the data sources. If you want to keep these credentials private, select Only me. If you want to share them with other builders, select Builders.
Choose whether to limit pages to Only child pages or the Entire website.
Finally, click the Save Add external integration button to add this website to your data source repository.
...