Beginner Tutorial: Connect GPT Models with Company Data in Microsoft Azure

Photo by Volodymyr Hryshchenko on Unsplash

Overview

Introduction
Setting up the Data
Creating an Index and Deploying a Model
Other Considerations
Wrapping Up

Introduction

There has been a lot of hype in the last year about GPT models and generative AI in general. While promises ** about a full technological revolution can seem somewhat overblown, it is true that GPT models are impressive in many way**s. However, the true value of GPT models comes when connecting them to internal documents. Why is this? 🤔

When you use plain vanilla GPT models provided by OpenAI, they do not really understand the inner workings of your company. If you ask it questions, it will answer based on what it most likely finds out about other companies. This is a problem when you want to use GPT models to ask questions like:

What are the steps in an internal procedure that I must follow?
What is the full interaction history between my company and a specific customer?
Who should I call if I have any issues with a specific software or routine?

Trying to ask plain vanilla GPT models these questions will give you nothing of value (try it!). But if you connect GPT models to your internal data, then you can get meaningful answers to these questions and many others.

In this tutorial, I want to show you how to connect GPT models with internal company data in Microsoft Azure. Just in the last few months, this has become a lot simpler. I will walk slowly through setting up resources and doing the necessary configurations. This is a beginner tutorial, so if you are very comfortable with Microsoft Azure, then you can probably skim through the tutorial.

You need to have two things before proceeding to follow along:

A Microsoft Azure tenant where you have sufficient permissions to upload documents, create resources, etc.
As of publishing, your company needs to apply to get access to the Azure OpenAI resource that we will be using. This will probably be lifted sometime in the future, but for now, this is required. The time it takes after applying until you get access is a few days.

NOTE: The real difficulty with making amazing AI assistants comes down to data quality, scoping the project correctly, understanding user needs, user testing, automating data ingestion, and much more. So don’t leave the tutorial thinking that creating a great AI assistant is simple. It is merely that setting up the infrastructure is simple.

Setting up the Data

Everything starts with data. The first step is to upload some internal company data to Azure. In my example, I will use the following text that you can also copy and use:

In company SeriousDataBusiness we always make sure to clean our 
desks before we leave the office.

Save the text into a text file called company_info.txt and store it somewhere convenient. Now we will go to Microsoft Azure and upload the text document. Search the marketplace on Azure to find the Storage account resource:

When creating Azure resources there are many fields that you can fill out. For a storage account, the important ones are:

Subscription: The subscription you want to create the storage account in.
Resource group: The resource group you want to create the storage account in. You might also decide to create a new resource group for this tutorial.
Storage account name: A unique name across all Azure accounts that is between 3 and 24 characters long. It can only contain lowercase letters and numbers.
Region: The Azure region that will host the data.
Performance: The choice Standard is good enough for testing.
Redundancy: The choice of Locally-redundant storage is good enough for testing.

Once you’ve clicked review and then create there should be a storage account waiting for you in the resource group you chose within a few minutes. Once you go into the storage account, go to Containers on the left sidebar:

In there, you can create a new container that essentially works as a namespace for your data. I named my container newcontainer and entered it. You can now see an upload button in the upper left corner. Click upload and then locate the beloved company_info.txt file you saved earlier.

Now our data is in Azure. We can proceed to the next step 🔥

Creating an Index and Deploying a Model

When I read a cookbook, I often consult the index at the back of the book. An index tells you quickly which recipe is on which page. Looking through the whole book every time I want to make pancakes is not good enough in a busy world.

Why am I telling you this? Because we are also going to make indexes for our internal data that we uploaded in the previous section! This will make sure that we can quickly locate relevant information in our internal documents. This way, don’t need to send all the data with every question to Gpt models. This would not only be costly but also not possible for even medium-sized data sources due to the token limits in GPT models.

We’re going to need an Azure Cognitive Search resource. This is a resource that will help us to automatically index our documents. As before, head over to the Azure marketplace and find the Cognitive Search resource:

When creating the Azure cognitive search resource, you should choose the same subscription, resource group, and location as for the storage account. Give it a unique service name and choose the pricing tier Standard. Proceed by clicking the Review button and then click the Create button.

When it is completed, we are actually going to create another resource, namely the Azure OpenAI resource. The reason is that we are not going to create the index in the Cognitive Search resource, but rather do it indirectly from the Azure OpenAI resource. This is more convenient for simple applications where you don’t need a lot of fine-tuning of the index.

Head again over to the Azure marketplace and find the Azure OpenAI resource:

You need to pick the same subscription, resource group, and region as the other resources. Give it a name and select the pricing tier Standard SO. Click your way towards the Review and Submit section, and then click Create. This was the final resource you needed for the tutorial. Grab a coffee or another beverage while you wait for the resource to complete.

When inside the Azure OpenAI resource, you should see something like this in the Overview section:

Click on Explore , which takes you to Azure OpenAI Studio. In the studio, you can deploy models and connect your internal data by using a graphical user interface. You should now see something like this:

Let us first create a deployment of a GPT model. Head over to the Models section on the left sidebar. This will show you the available model that you can use. The models you see might be different than mine, based on the region you have chosen. I will select the model gpt-35-turbo and click Deploy. If you don’t have access to this model, then pick another one.

Pick a Deployment name and create the deployment. If you head over to the Deployments section on the left sidebar, then you can see your deployment. We will head over to the section Chat on the left sidebar where we will start to connect the data through an index.

You should see a tab called Add you data (preview) that you can select:

When you are reading this tutorial, this feature might be out of preview mode. Select Add a data source and select Azure Blob Storage as your data source. The rest of the information you need to input are the subscription, Azure Blob storage resource, the storage container where you placed the document company_info.txt , and the Azure Cognitive Search resource we created:

Enter an index name and leave the option Indexer schedule as Once. This is how often the index should be updated based on potentially new data. Since our data won’t change, we pick Once for simplicity. Accept that connecting to an Azure Cognitive Search account will incur usage charges and continue. You can pick Keyword as the Search type under Data management:

Click Save and close and wait for the indexing to finish. Now the deployed GPT model has access to your internal data! You can ask a question in the Chat session to try it out:

The chatbot gives the correct answer based on the internal documents 😍

It gives a reference to the correct document so that you can check out the source material for confirmation.

There is also a button called View code, where you can see the requests made in various programming languages. You can send this request from anywhere, as long as you include the endpoint and access keys listed. Hence you are not limited to the playground here, but can incorporate this into your own applications.

You have now successfully connected a GPT model with internal data! Sure, the internal data is not very interesting in our tutorial. But you can imagine doing this with more pressing material than desk policies.

Other Considerations

Here I want to point you towards some further things to play with.

System Messages

You can also specify a system message in the Chat playground:

This is sometimes called a pre-prompt in other settings. This is a message that is sent every time before the actual question is being asked by the user. The purpose is here to give context to the GPT model about the task at hand. It defaults to something generic like You are an AI assistant that helps people find information.

You can change the system message to request a specific format of the response, or to change the tone of voice for the answer. Feel free to play around with this.

Parameters

You can find a Configuration panel (it is either already visible or you need to go to Show panels and select it). It looks something like this:

Here you can tweak many parameters. Maybe the most important one is Temperature, which indicates how deterministic the answer should be. A low value indicates highly deterministic, so it will give roughly the same answer each time. A high value is the opposite, so the answer is more varied each time. A high value often makes the model seem more creative.

Deploying to a web app

When you have finished your tweaking of system messages and parameters, you might want to deploy the model to a web application. This can be done rather easily from within the Azure OpenAI Studio. Simply click the Deploy to button and select A new web app...

After filling out the relevant information you can access the model from a web application. This is one of the ways to make the model available to others.

Wrapping Up

In this tutorial, I’ve shown you how to connect GPT models with internal company data in Azure. Again, I want to emphasize that this is only the first step to getting an amazing AI assistant. The next steps require expertise in areas such as data quality, index optimization, service design, and automation. But you now have a minimal setup that you can develop further 👋

If you are interested in AI or data science then feel free to follow me or connect on LinkedIn. What is your experience with connecting GPT models to company data? I’d love to hear what you have to say 😃

Like my writing? Check out some of my other posts for more content: