top of page

Prompting, DRAP, and Plain(er) English

  • May 27
  • 7 min read

Updated: May 29

Tweaking the behavior of a token machine with a structured approach.


EDWARD VON DER SCHMIDT 27 MAY 2026 (UPDATED 29 MAY 2026)



The "Life" of a Prompt

What happens when we prompt a generative AI model? Typically, "transformers" convert or "encode" symbols like text or media into sets of numbers. These inputs are first broken down into components or "tokens" and transformed into coordinates (vectors). This "context" is woven or filtered through a numerical field of weighted probabilities that represent estimated relationships among concepts. During "inference", the "attention mechanism" steering this process determines the most "likely" next token (e.g. piece of a word) based on the context, weights, and its internal prediction model - crucial underpinnings. That token is decoded back into its symbolic form and added to the response. The output token is included in the input for the next token and the process repeats itself. The responses you see are generated piece by piece in one go. (Vaswani et al, "Attention Is All You Need", 2017).


This is a very clever method. The model does not need to understand what goes in or what comes out; in fact, it does not "understand" anything at all in the way that we do. The model "only" needs to recognize and transform things into numbers, interpret and pass these numbers through a meaningful mesh of pre-trained probabilities, and return the "best fit" prediction results, one token at a time. Generative AI models sound and look convincing because they channel our own embedded knowledge, most of which was roughly captured during their "training" as massive databases of numerical relationships. This is intricate math, not conscious thought!


It is important to acknowledge that large language models (LLMs) cannot and do not "think" on their own but this does not make them less capable. We have programmed computers that cannot think in a traditional sense to accomplish incredible feats for nearly a century by leveraging their ability to follow and scale repeatable logic. That these models might appear intelligent speaks more to their ability to deftly process language and media, translating them into code and vice versa. While we might use "thought" as a metaphor, we do not actually need AI models like LLMs to be intelligent on their own to make them do intelligent things.



Probable Mistakes

Since transformer models can effectively interpret and translate text into processable code, we can tell these models what to do with plain language and avoid complicated rules of syntax and grammar. While this is nowhere as precise as traditional (deterministic) programming, in practice it works despite its lack of any formality or verifiable process. The model is capable of getting the "gist" of our instructions and more or less follows a path that "simulates" them, even if this is more mimicry than procedural execution. This process is fundamentally probabilistic, meaning we cannot know for sure how the model will respond to a prompt and that responses will probably vary in some way.


One thing we do know is that these models do not generally examine their final work or understand what it means. When a response is generated, you are seeing the live rendering of a first and only round of output. There are any number of ways to use context (input) to influence the likelihood that a model generates a certain kind of response (output). This is done by manipulating conditional probabilities - the odds of something occurring based on the knowledge that something else has occurred. Incidentally, this is exactly how prompting works in general. However, preparation can only go so far. The overall process is linear: input goes in, output comes out, and that's that.


Because generation is based on probabilities and not predetermined steps, all kinds of "mistakes" are bound to happen. The model does not review let alone understand its output, so there is no mechanism to correct for errors. This is a problem because transformer models are "autoregressive" - previous output is reused as input. Context is also evaluated as a whole - every token affects the interpretation of every other token. Not only will mistakes become input, they will affect the model's interpretation of other inputs including future prompts.


With no way to "delete" errors, they will persist and propagate or cascade through the rest of the session. Even if errors are pushed out of context, their influence will remain in the tokens they affected. Hallucinations and error propagation are an unavoidable aspect of generative AI's architecture. If we cannot remove these errors entirely, can we try to attenuate or lessen their influence?


Is there a way we can have these models "do it again" before they respond? How can we instruct a model to "check" its work? What will it be checking for? How will the model know when to stop? How do we know that this simulated review even takes place? One solution might be to simply tell the model how to do these things, look at the results, and try again.



Semantic Programming and DRAP

LLMs are capable of interpreting and following natural language instructions, though execution is not guaranteed to be correct (or even occur). We have to settle for simply improving the chances that a particular type of outcome will be generated. If the model is capable and we are precise, there is a greater chance that the most likely token path will be close enough to what we are asking for. There is no true "understanding" taking place, only the orchestration of odds.


Given that these models can field all kinds of input, it is not a stretch then to ask the model to consider its own responses as input. The trick is to try to have the model do this before it answers. Fortunately, transformers are designed to use output as input. We can direct the model to simulate a "review" of its own output before responding, and even tell it what to look out for and how it might "improve" previous work in the next round of output.


Instead of stopping after the first complete "draft", the model may continue working to generate what amounts to an improved draft to give us instead. In fact, we can guide the model to repeat this process in order to conform to standards and criteria of our choosing. This is what we mean by "Recursive analysis through procedural semantic programming", or telling a model in plain terms how to repeat a process.


The notion of having an LLM (probabilistically) iterate its own output before responding is at the heart of a framework we called the Datum Research Analytical Process (DRAP), an internal methodology designed to attenuate the influence of errors and impose recursive rigor on generated output. We chose to share DRAP publicly for experimental research purposes. This framework details repeatable steps for the model to iterate its output according to guidelines chosen to uphold a set of core principles. A "foundation" establishes overarching standards, a "focus" or "lens" provides more specific criteria to evaluate and address, and a "reflection" layer tells the model how its own work should be viewed and recomposed with these guidelines and principles in mind. A final "synthesis" ties these ideas together.


The application of a "system prompt" is not new: many approaches guide or bound the output with an initial set of instructions contained in a prompt. Still, we are left with write-once-and-hope-for-the-best along with any mistakes made along the way. If we tell the model how to examine its previous work and compose a "better" draft in a subsequent round of output, we can theoretically diminish the influence of previous errors and return a "cleaner" response. If we give the model a repeatable sequence of steps to follow and the right definitions, we can guide it to continue working on its own until the job is "finished". With semantic programming, we can theoretically give these instructions to any model (or person!) - provided that they are sufficiently capable and that we view these instructions as probabilistic guidance.



Critical Caveats

Semantic programs, which here use natural language or media to direct LLM generation by influencing the probability of outputs, are very different from classical programs. Instead of executing formal, verifiable logic, we are relying on the LLM's parameters, encoding, context, pattern recognition, training, attention mechanism, and prediction model to correctly interpret our instructions in such a way that there is a higher probability that the generated tokens will more or less "match" the output we were looking for. This is very inexact and lacks any formal guarantees, but language and media can offer flexibility that formal logic cannot: ambiguity can be a feature. While we stand a decent chance of having the model reproduce our directions if we state them clearly enough, compliance is variable and far from promised.


DRAP uses "drafting" and "review" as metaphors (since the program is also for people), but the model does not "revise" or "loop" in any classical sense. Output is one continuous stream so there is still no way to remove what has been generated, which may contain errors. That said, subsequent improvements should "outweigh" those mistakes and lessen their influence. What is described as a loop might be thought of conceptually as a coil, with the attention mechanism weighing the review process and evaluative criteria as applied to previous output more heavily as the work continues after the initial output is generated. Importantly, this concept is theoretical - we are working on verifying its exact mechanics.


If you're using a Chain-of-Thought (CoT) model, only the "final" draft at the end of the output stream should be included in the visible response, which should further attenuate the importance of earlier drafts. Models without a latent or hidden workspace like a "thinking block" may output the entire stream of drafts and interim work. While the latter is not ideal for presentation, this could make for interesting research as to how a probabilistic token generator actually "simulates" a procedure in the manner of its output. The obvious cost to DRAP's approach is that it requires more tokens and time.


Crucially, just because the model simulates the idea of looking over its own work does not mean that the final response is verified or true. There is no guarantee of either; external validation is always required. Self-directed examination may be better than no "review" at all, but the response will always be a sequence of probabilistically generated tokens and not a product of conscious thought. If the input (context, training, design, application, etc.) is garbage, the output will be too.


Words for Thought

These models may be able to convincingly mimic an intelligent process in their own useful way, but they are not smart enough to know what they are doing. The DRAP disclaimer for LLM responses is worth repeating:


"PROBABILISTICALLY GENERATED OUTPUT REQUIRES EXTERNAL VERIFICATION OF ALL ITS CONTENT, INCLUDING SOURCES, PREMISES, AND CONCLUSIONS. THIS OUTPUT IS SUBJECT TO THE FUNDAMENTAL LIMITATIONS OF GENERATIVE AI ARCHITECTURES, INCLUDING ERRORS, HALLUCINATIONS, OUTDATED KNOWLEDGE, AND IMPLICIT BIASES."


Use LLMs and other generative AI models with intention and care as tools and not as substitutes for your own thinking. The most important input is the person operating the machine. A thoughtful, collaborative process is more powerful than the product.



This essay was written by a person and evaluated by DRAP using Gemma4. This is informational commentary, not advice.

Recent Posts

See All
bottom of page