How to set up a brand knowledge base for AI: what goes where and why it matters
Setting up a brand knowledge base for AI is one of those things that looks simple until the output starts drifting and you realise the structure was never right to begin with. The knowledge base is the single most important input in any AI content system - not the prompts, not the workflows, not the model you choose. Get it right and everything downstream sounds like you. Get it wrong and you spend more time editing than you would have spent writing.
What a brand knowledge base for AI does
Your brand knowledge base runs beneath every AI content workflow. Every time an agent writes a blog, drafts a LinkedIn post, or puts together a newsletter, it reads the knowledge base first. That is what makes the output sound like your brand rather than like a statistical average of everything the model has ever seen.
It is a structured, typed document the AI references as operating context - the same context, every single run. When you update it, every workflow updates with it. One source, everything in lockstep.
To explore how the knowledge base sits inside a full content system, the knowledge base feature page walks through how agents read and apply it at run time.
The two-layer model: knowledge base versus library
Keep the knowledge base and content library completely separate. They serve different purposes and mixing them degrades output quality.
The knowledge base holds your stable, strategic brand context - the things that are always true about your brand and should be present in every single piece of content you produce. Tone of voice, audience profiles, positioning, your point of view, writing rules, hard constraints on language. This is the always-on layer.
The library holds your timely, proprietary raw material - transcripts, recorded interviews, campaign playbooks, past content, meeting notes, sales call recordings. This is the on-demand layer. You pull from it selectively when it is relevant to a specific task. It does not live in the same vault as the strategy layer.
Stanford and UC Berkeley researchers demonstrated exactly why this separation holds. Their "Lost in the Middle" study found that LLMs prioritise content at the beginning and end of a prompt, neglecting the middle. Accuracy dropped by over 30% when key information landed in the middle of a large context load. In some cases, models performed worse with relevant information present than with no documents at all.
So if your writing rules are sitting somewhere on page four of a 15-page vault, the model may simply not apply them. Keeping the knowledge base lean and the library separate is what prevents that. The content library handles the raw material side - structured separately so agents pull from it selectively, not constantly.
The ten sections a brand knowledge base needs
A well-structured knowledge base covers ten typed areas. Each one has a specific job. None of them should be vague.
Brand strategy, your market position, what you are building, who you are for, and the problem you solve. This is the orientation layer - it tells the AI what business it is writing for and why that business exists.
Tone of voice, specific, documented voice characteristics with examples that show the AI what right looks like. Document the actual patterns: sentence length, rhythm, the register you operate in, the tonal qualities you want in every piece.
Audience profiles, who you are writing for, what they already believe, where they are in their journey, and what they are trying to solve. The AI uses audience context to calibrate depth, assumptions, and language. Vague personas produce vague content.
Writing guide, the specific mechanics of how your brand writes - British or American English, sentence case or title case, how you handle lists, your preferred sentence structures, rhythm patterns. These are the granular rules that make output recognisably yours.
Rules, hard never/always constraints that the AI must follow on every run. Things like "never use em dashes," "always sentence case headings," "never hedge." These live at the top of the knowledge base so they land in the high-attention zone of the context window, not buried in the middle where they might get lost.
Point of view, your brand's specific beliefs and the framing that is distinctly yours - where you stand on contested ideas, the perspective that gives AI-generated content a distinct angle instead of a generic one.
Competitor positioning, who you are up against, how you differentiate, and what framing to avoid because it sounds like someone else. Useful for content that touches category comparisons or positioning claims.
Approved sources and domains, external sites and authors you trust, publications you want the research layer to pull from, domains worth citing. Restricting sourcing to approved domains keeps the research layer clean and on-brand.
Boilerplate, your standard brand description, founder bio if relevant, and any fixed language that appears consistently across your content. This prevents the AI from improvising descriptions of your business every time it needs one.
Content strategy, your key themes, content pillars, channels, and the angles you want to own. This tells the agent what topics are in scope and what the brand is trying to build authority around.
How to write each section so the AI can use it
Write knowledge base content as direct, declarative instructions. AI models apply clear statements consistently - adjectives and abstractions give them less to work with than concrete examples and explicit rules do.
Write the tone of voice section with examples that show the AI what right and wrong sound like. The contrast is more informative than any number of adjectives you could layer on.
Write the rules section as a list of explicit constraints - "never" and "always" statements that leave no room for interpretation. Position the rules section at the top - LLMs apply top-loaded content most consistently. Rules buried lower get missed.
Keep each section focused on one thing. A tone of voice section that bleeds into audience context and then into competitor positioning creates exactly the kind of noisy, mixed-signal context that degrades output quality. Typed sections, each doing one job, give the AI clean signal to work with.
The guide to structuring an AI knowledge base covers how context loads affect output quality, including the research behind why lean knowledge bases consistently outperform large ones.
What belongs in the library, and how to feed it
The library is where your proprietary material lives. Transcripts contain proprietary insight unavailable from public sources. A founder talking through their approach to a specific problem, a client describing an outcome in their own words, a team working through a strategic decision in a recorded session - that is the kind of content that cannot be reverse-engineered from public sources. Feed it into the library and the agents have something to draw on that no competitor can replicate.
Past content also belongs here. Published blogs, approved social posts, newsletter editions - these give the AI examples of output that has already passed your quality bar. They are reference material for style and depth, pulled in selectively rather than always loaded.
What goes in the library is anything specific to a time, a campaign, a conversation, or a project. The knowledge base holds the always-true strategic context. Each layer stays useful precisely because it holds only what belongs there.
CXL's breakdown of how AI erodes brand voice without guardrails is a useful read for understanding why the structural separation between these two layers is the guardrail that makes consistent output possible.
Setting up the rules layer: the part that often gets skipped
The rules section is the most underrated part of a brand knowledge base. Every other section can be well-written, specific, and comprehensive - and the output will still drift if the hard rules are not documented and positioned correctly.
Hard rules are the non-negotiable constraints on language and format. Things like banned words, prohibited sentence structures, formatting standards, and any language pattern that would make the output sound wrong for your brand. Write them as short, explicit statements: "never use the word 'leverage' as a verb", "always write in British English", "never use em dashes".
Position the rules section at the top of the knowledge base document. LLMs apply top-loaded content most consistently. A rule placed at the top of a lean knowledge base gets applied. The same rule buried in the middle of a long document gets missed, and your output drifts in exactly the ways you were trying to prevent.
IBM's research on generative AI for knowledge management found that structured, well-organised knowledge inputs produce measurably more accurate and on-target outputs - a finding that holds whether you are working with enterprise systems or a solo founder's brand KB.
Building the knowledge base: a practical starting point
Build a knowledge base by auditing what already exists - your website copy, brand guidelines, past content that feels right. Any existing positioning deck, onboarding documents, or approved campaign copy counts. The knowledge base is the distilled, structured version of what is already implicit in how your brand communicates.
Work through each of the ten sections in order. Strategy and audience first, because everything else flows from them. Rules next - position them early and make them explicit, because you will often discover what constraints matter as you write the strategy and audience sections. Then tone of voice and writing guide, which are the sections the AI uses most frequently and need to be the most specific. Finish with the remaining sections in whatever order surfaces naturally from your audit.
Test each section by running a content task against it and reviewing the output. If the blog sounds right, the section is working. If it drifts, identify which section the drift is coming from. Dig into whether the tone of voice section is too vague, the rules too general, the audience profile too broad, or the point of view too thin to give the AI a clear steer.
The guide to creating on-brand AI content goes deeper on what voice definition looks like in practice, including the specific inputs that determine whether AI output sounds like your brand or like everything else.
Keeping it current: when to update and what to change
A knowledge base is not a one-time setup. It is a living document - the brand's strategic layer updates when the brand updates, and the AI follows. The practical cadence is lighter than people expect.
The rules and writing guide sections rarely change once they are well-defined. Brand strategy and audience profiles shift with major positioning work, product updates, or meaningful changes in who you are targeting. Tone of voice evolves gradually. The approved sources list gets updated as you discover new trusted references or retire old ones.
What triggers a knowledge base update is usually a pattern of output quality issues. If content is consistently landing in the wrong register, the tone of voice section needs work. If the AI keeps writing about the wrong audience, the audience profile is too vague. If specific language patterns keep showing up that you want gone, add them to the rules section explicitly. And if none of those fix it, the point of view section probably needs sharper edges - that one is easy to write generically and hard to notice until the content all starts to feel the same.
Update one section at a time and test after each change. Changing multiple sections simultaneously makes it hard to diagnose which update fixed what. Treat it like a system - isolate variables, run the test, confirm the improvement, then move on.
For teams running agentic workflows across multiple channels and content types, agentic content workflow design covers how the knowledge base feeds into multi-step content pipelines and where the returns from a well-built KB show up most clearly.
Frequently asked questions
What is the difference between a brand knowledge base and a content library?
The knowledge base holds your stable, strategic brand context - tone of voice, audience profiles, positioning, writing rules, and hard constraints. The library holds your timely, proprietary raw material - transcripts, past content, campaign notes, recorded interviews. The knowledge base is always loaded as context. The library is pulled from selectively when relevant to a specific task. Keeping them separate is the structural decision that most directly affects output quality.
How long should a brand knowledge base be?
Short enough to stay in the high-attention zone of the model's context window. Practically, this means covering the ten core sections with enough specificity to be useful, but without padding or repetition. A well-structured knowledge base for a small brand typically runs 1,500 to 3,000 words across all sections. Lean and specific knowledge bases consistently produce better output.
What should go at the top of a brand knowledge base?
The rules section. Hard never/always constraints get applied most consistently when they are positioned at the top of the document, because LLMs favour content in the early portion of the context window. If your formatting rules, banned words, and language constraints are buried in the middle of a long document, they will get missed more often than you would want. Front-load the rules, then follow with strategy, audience, and tone of voice.
Can you use the same knowledge base for different content types?
One curated KB powers every content type: blogs, social posts, newsletters, press releases, email sequences. The knowledge base holds the strategic context and brand rules that stay fixed across all formats, while the agent follows a different playbook or workflow instruction depending on the content type.
How do you know if your brand knowledge base is working?
Run a content task and review the output against two criteria: does it sound like your brand, and would you publish it without rewriting it? If the answer to both is yes, the knowledge base is doing its job. If the output drifts in voice, makes wrong assumptions about the audience, or ignores your writing rules, identify which section the problem is coming from and make it more specific. The knowledge base is working when editing time drops and publishing confidence goes up.