Building Access Control into AI Applications

Many emerging AI tools use a form of RAG (retrieval augmented generation) over your organization’s internal wiki. The initial set up can be simple. However in practice, you can’t simply connect your entire team’s Google Drive and allow all employees to ask the model questions. This is a guide to implementing ReBAC for LLM apps.

Andrew Luo

Co-Founder, CTO

Min Read

The Problem

Many emerging AI tools use a form of RAG (retrieval augmented generation) over your organization’s internal wiki. This wiki could be from Drive, Notion, or another online cloud storage system. The relevant internal documents are fetched at query time for an LLM to synthesize an answer rooted in your company knowledge. However in practice, you can’t simply connect your entire team’s Google Drive and allow all employees to ask the model questions. Some documents may be only accessible to certain teams. This was one of the first principles that we built into our multi-modal knowledge base at DealPage.

But we wanted to move fast and this wasn’t really a core feature. How could we build something that was enterprise ready, in just a few weeks?

Authorization at Google

As some background, in my previous role at X, one of my first projects was working on an authorization system. In my research, I came across several systems like Himeji and of course the famous Zanzibar, which is what a lot of systems today are based off of. These implement the idea of ReBAC or Relationship based access control. I won’t go too in depth on these systems, as there are plenty of good write ups out there, but for DealPage we deployed Ory Keto which uses a similar ReBAC system.

The Solution

Authentication

Our choice for authentication was Clerk. It was easy to use and set up and provided pre built components for organization and user management within our app. Clerk assigns each user a unique user ID in the form of user_dj398nd..... . This user ID is what we use to associate permissions and relationships.

Authorization

In DealPage, there are 4 core resources (Deals, Documents, Folders, and Collections) and 4 permissions (View, Edit, Share, Delete). Each resource can be Invite only where select users have access or Everyone in my org where anyone in the DealPage organization has a certain access level.

Invite only permissions

We built our individual permissions check system using Ory Keto, an open source library for managing relationships and permissions. Documents can have a deal, folder, or collection as a parent. Folders can be the parent of another folder. Users can have a viewer/editor/owner relation on individual documents or on a deal/folder/collection. Access to a parent grants access to the child documents.

The hierarchy looks like this:

Keto provides a Docker image to quickly deploy using a provided configuration and the relation tuples are stored in AWS RDS.

Their API is simple and provides basic methods for inserting, deleting, updating, and checking relation tuples. It handles most of the checking logic for us by recursively going up parent paths and checking deals, collections, folders, and parent folders.

This set up provides the basic logic needed for Invite only types of permissions. We also support an org level sharing option, which is the default for newly created resources.

Org wide permissions

We also support an org level sharing option, which is the default for newly created resources. To implement this, each folder, deal, collection, and document row in our Supabase DB has two columns: org_sharing_relation which indicates the level of access for your org (viewer, editor, owner) and org_sharing_enabled (true/false), which indicates if this option is enabled. We then create 3 views( org_wide_viewers , org_wide_editors , org_wide_owners ) that store all of the ids that have the option enabled for that level of access. Here is the sample query used to create the org_wide_viewers view:

Permissions Checks

At individual locations in our app, we allow/disallow actions based on permissions checks. For example, to check if user Bob has view access to document with id doc_1 , we first check using Keto’s API check(Bob, viewer, doc_1) and then check if doc_1 is inside the org_wide_viewers view. If either is true, then view access is granted.

For things like search and RAG chat, we fetch a set of document IDs which the user can access. This is done by combining the doc IDs in the org_wide_viewers view with a query that recursively expands folders from Keto until all documents are retrieved. The final list of document IDs is passed downstream to our index in Vespa as a filter to only return content from documents the user can access.

Integrating with Drive & Slack

Because of this system, we decided to create copies of documents from external integration sources like Drive so that it would be easier to manage permissions. The Drive API luckily provides an endpoint to list permissions, so we map the Drive permissions to our own internal permission model. Drive users are linked by email to the matching Clerk user in DealPage.

Access control in Slack is enforced in a similar way, by matching their Slack account email to the Clerk user in DealPage. Once the Clerk user ID is found, we can check permissions normally using the method above.

Learnings

While this system is not perfect, it was very quick to set up and provides a great foundation for building principles of least privilege into our RAG system. The extra layer of security provides additional trust for enterprise accounts and does not cause too much friction since everything is org wide by default.

For more on AI, pre-sales, and RAG, check out our other blog posts or subscribe to our newsletter below!

‍