For nearly two years, we have seen companies seek to use generative Artificial Intelligence (GenAI) as an engine for transforming how we create, manage, and optimize technical content – including product and how-to guides, online help systems, and courseware for workforce training and development, to name just a few.
However, the results of applying generative AI to content have been compromised, with results ranging from incomplete information to outright hallucinations. The issue isn’t in the AI technology itself. Instead, it lies in a more fundamental area: the quality and organization of content that powers these systems.
It helps to think of content as the fuel for AI. A fragmented content infrastructure and poor content quality act like contaminated fuel that severely limits the potential of any AI investments. In other words, without a robust content foundation in place, even the most advanced AI implementations will struggle to deliver their promised value.
In this article, we will examine technology and business best practices that organizations should put into place to implement AI and turn technical content into a premium fuel for maximizing performance.
The value of AI in technical content
Before diving into how to optimize content for AI, let’s look at the three ways in which AI can help technical content teams drive efficiency and effectiveness.
- Content assistance is focused on using generative AI tools to help with content authoring, for example, creating an initial draft that aligns with the company’s style guide.
- Content access has two sides: On the one hand, it is focused on how generative AI can help employees, customers, and other stakeholders to more quickly find the information they need, for example, using an AI-driven chatbot, or semantic search. On the other hand, it is about how proprietary content can be accessed to feed an AI model.
- Content intelligence is the practice of using AI to analyze content and address such questions as: Where are my content gaps? Are there any duplications of my content? How valuable is my content? Is it actually meeting the readability standards of my audience?
What these three applications of AI have in common is that, in most cases, the AI tool needs to run against the organization’s proprietary content and data in order to achieve its full promised value. This has implications for how AI should be implemented within the enterprise and how content is organized to support any AI-driven assistance.
Bringing AI in-house
At the heart of a successful AI implementation is a process called “grounding”, i.e., connecting AI models with an organization’s own current, accurate, and relevant information. Some larger companies build custom AI models from scratch. However, most enterprises take one of two approaches: They either adapt existing AI models or tools with company-specific information or, they use Retrieval-Augmented Generation (RAG) technology to connect internal databases to existing AI tools, and then prompt the tool to provide contextual responses.
The popularity of commercially available AI tools cannot be underestimated. For example, 80% of Fortune 500 companies have employees using ChatGPT for work, according to OpenAI. Here’s the problem: This statistic comes from registered ChatGPT consumer accounts associated with corporate email domains. In other words, there is no definitive corporate control and content governance over these users.
Therefore, organizations should consider adopting the enterprise version of a generative AI tool that enables companies to work with their own data and content sources in an environment with advanced security and privacy protections. Some of these enterprise-class tools include OpenAI’s ChatGPT Enterprise, Microsoft 365 Copilot Enterprise, and Gemini for Google Workspace. Beyond supporting security and privacy, they also enable high degrees of customization.
It is not always necessary to sign a multi-year contract for an AI tool, and there are cost-effective ways to get started. The important thing when choosing an AI tool is to read the vendor’s policies for data retention, AI model training, etc. A case in point: Both OpenAI and Microsoft Azure provide serverless versions of ChatGPT. But Microsoft Azure has a serverless, non-persistent version of ChatGPT that doesn't train on user prompts; there is no data persistence between uses; the data of your session is properly destroyed once the session ends, and you only pay per use. Microsoft’s policy for privacy and data retention is stated very clearly.
![]() | What is Retrieval-Augmented Generation? Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a Large Language Model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output, so it remains relevant, accurate, and useful in various contexts. Source: aws.amazon.com |
The role of RAG
Today’s generative AI tools are typically based on Large Language Model (LLM) technology. To effectively leverage an AI tool with an internal content repository and provide important guardrails, many companies will find that a RAG system is the most accessible and maintainable solution currently available. The RAG system can take a user’s prompt to the organization's content repository, where it will find the most relevant content. It then uses this content to augment the user's prompt before passing it on to the LLM with highly specific context in the form of supplemental document text and even additional instructions to improve the LLM's response. For example: “Answer this user's prompt with this additional information, and don't create any answer that goes outside of that context. Do not answer a question that is not answered within the specific content.”
Using a RAG solution to provide contextual awareness in this way can reduce hallucinations by providing the LLM with targeted, factual context, as well as additional instructions that may help it tighten the scope of its answers. This is extremely important in a workforce setting. You don't want the LLM to start making things up when creating a safety compliance workforce training document.
Some Enterprise Content Management Systems now embed RAG technology. However, employees can also create their own RAG systems using off-the-shelf tools. One of the most popular ones is LangChain, an open-source framework for developing applications powered by LLMs. Creating a RAG with one of these tools doesn’t require a dedicated data scientist. However, it does require somebody who is able to understand and write some Python code and has access to both the systems where the content is stored and the LLM itself. In fact, at one company, we’ve seen a project lead with some Python programming experience create RAG systems by himself.
Optimizing content for RAG access
The RAG system provides a critical connection between corporate content and an AI tool’s LLM, but the structure of content will ultimately determine the performance of AI-driven processes. Content should be broken up into reusable topics or components and then tagged, so that information can be readily aggregated and searched. If content has not been structured this way, technical writers should begin a phased initiative to make these updates before moving beyond the pilot phase of an AI implementation.
A RAG solution can connect to multiple content sources via Application Programming Interfaces (APIs). These might include, for example, a Content Management System (CMS) for product documentation and online help, a Learning Management System (LMS) for online and in-person courses, and a Customer Relationship Management (CRM) system with an integrated knowledge base for the support team.
A more effective approach is to connect the RAG system via an API to a content syndication platform that can aggregate existing content from different systems, as well as push out content. This approach facilitates the RAG solution’s contextual search while ensuring content consistency and accuracy.
APIs provide AI tools access to the content itself, but enterprises need to consider another important point of access: the semantic data layer.
Investing in semantic layer access
The semantic layer is a type of search index where semantic analysis is done on an organization’s content repository. It supports content intelligence by allowing AI tools to perform advanced functions, such as duplication analysis and content inventory mapping.
Duplication analysis provides the answers to questions such as: “How similar are these two pieces of content? Could I retire one of them?” It’s not just looking at whether the text is the same, but also if it is covering the same topic in the same manner. So, this is an important function.
Content inventory mapping maps out an organization’s content based on a semantic analysis and enables a specific piece of technical content to be compared against a specific taxonomy based on a set of classifications or tags. It can help answer questions such as: “How many pages of content are there for this product versus that product? Are there no pages for a particular product? Is there excess content, or are there old pieces of content that need to be retired?” Similarly, for learning and development content, a specific taxonomy may include skills as a parameter, helping to answer questions such as: “What skills are being effectively reinforced by our content?” and “Where are we missing workforce training opportunities?”
Some Enterprise Content Management Solutions provide semantic analysis. However, as with RAG solutions, there are off-the-shelf tools that enable anyone with moderate technical proficiency to build their own. One popular option is Algolia, an off-the-shelf search provider that includes semantic analysis out of the box. Alternatively, users who want full control can utilize resources such as Hugging Face, a widely recognized open-source community that provides access to multitudes of neural models, many of which are targeted at providing semantic content analysis.
Addressing the governance workflow
Beyond these technical considerations, it is important to evaluate the organization’s content development and governance workflow to identify areas where AI can be utilized and where human review is required. For example, an AI model can be built directly into the workflow so that when a technical writer finishes drafting a product guide, it will go through an AI-driven review process, where the AI model states: “It seems like this content contradicts other information in your content repository. Would you like to review the contradictory content?”
Such an approach enables greater content quality and efficiency because the generative AI tool is being used in conjunction with a human content author. AI effectively acts as a writing partner, providing the content author with real-time feedback and quality control that require human confirmation of actions. In this way, the human content author and the AI assistant work with each other to provide a series of checks and balances.
Governance policies should include role-based access control, trackability of content modifications by humans or AI, and proper structuring and tagging of content for downstream AI retrieval. Companies with established design and voice guidelines will have an advantage, as these guidelines can be used to guide AI tools.
Enterprises also need to think about enacting and enforcing corporate policies for how and when generative AI functionality is used. This will be a cross-functional effort that brings together members of the legal, human resources, information technology, and compliance teams in addition to leaders within the training, learning and development, and technical communications teams.
Putting a project team in place
Some corporations have large teams of data scientists and engineers implementing their AI content optimization projects. However, as discussed earlier, functions can well be handled by a single project lead with a moderate level of technical expertise. For many organizations, the solution will take a middle path: a project manager and an engineer with machine-learning expertise.
Importantly, organizations have to treat any application of AI against content as a product, and the project manager needs to serve as a traditional product owner. This person will serve as a liaison between the business and the actual implementation, informing stakeholders and ensuring that the engineer in the project is able to answer their questions.
Conclusion
We began this article by looking at content as the fuel for AI. When properly implemented with the right technologies, content structure, governance, policies, and team structure, a symbiotic relationship can form: Well-organized and easily searchable content enables AI tools to deliver more accurate and relevant assistance to employees, customers, partners, and other stakeholders. At the same time, AI-driven analysis applied against the organization’s content inventory provides technical writing teams with new insights for continuously enhancing these user experiences.