Microsoft SharePoint
Sharepoint is a web-based, collaborative platform that integrates with Microsoft Office and allows you to store and share files. A SharePoint is organized in sites with each containing their documents in folders that can be leveraged via a data source.
Prerequisites
Before you create a Microsoft SharePoint data source, you need to ensure you have configured SharePoint credentials in Administration > Credentials.
There are two ways to access SharePoint content:
With a username and password. In this case, you should create SharePoint credentials.
With an Azure application. In this case, you should create Azure application for SharePoint credentials.
Adding a Microsoft SharePoint data source
To add a new Microsoft SharePoint data source,
Go to Build > Data sources > External integrations.
Click the Add external integration button;
Next, click the Microsoft SharePoint card and provide the following information:
Name the data source;
Select existing credentials for SharePoint;
Add a description for your data source;
Select the Language of your SharePoint files;
Specify the Site path where your SharePoint content is located, in the following format
/sites/mysite
.Optionally, you can specify the Folders' paths. By default, this is the Shared Documents folder, which corresponds to the main default Documents folder.
Optionally, you can specify Regex patterns for files inclusion to include specific files. Regex patterns follow this format:
.*\
+ the file extension you want to include, e.g..*\.pptx
will include all PowerPoint files.Optionally, you can specify Regex patterns for files exclusion to exclude specific files. Regex patterns follow this format:
.*\
+ the file extension you want to exclude, e.g..*\.mp4
will exclude all MP4 files.Select the PDF text extraction strategy for the PDF files:
Text only (Better speed): extracts the texts using PyMuPDF.
Text by Mistral OCR (Intermediate): Extracts the text from the PDF page by page. Extracting the text with this method is very fast. Note that images within the PDF will be ignored, unlike using the Text and images options.
Text and images (Intermediate): The text is extracted by PyMuPDF, and then, we detect the images on the page, which are finally transformed into text via an LLM.
Text and images, reformatted (Better quality): the text is extracted by PyMuPDF, and then an image of the page and the extracted text are given to an LLM, which takes care of outputting the text in a structured manner.
Select the Image content extraction strategy:
Text by Google OCR (Better speed): The image is sent to the Google OCR service to retrieve the text from the image.
Text by Mistral OCR (Better speed): The image is sent to the Mistral OCR service to retrieve the text from the image.
Text and content (Better quality): The image is given to an LLM that will describe the image and the content.
Finally, click the Add external integration button to add this new integration to your data source repository.
Video tutorial: How to create a Microsoft SharePoint external integration
What’s next?
Now that the data source has been created, you can select it when creating an agent.
Find more information about how to create an agent by reading this page: Build your own agent.