Designing an MCP Markdown Knowledge Server: Why It Should Not Return HTML

2026-06-23

mcpmarkdownknowledge-baseclaude-codenextjsarchitecture

When you keep a knowledge base in Markdown, you may want the same articles to serve more than one reader: a website, an AI coding tool, a local search tool, or an editor extension.

That creates a design question for the MCP server: should it return HTML, or should it return the original Markdown?

For a knowledge-base use case, the better default is simple: return raw Markdown from the MCP server, and let each client transform it for its own UI. MCP (Model Context Protocol) is a standard way for AI applications to connect to external data and tools. If the MCP server also becomes the presentation layer, it starts to carry assumptions that only belong to one client.

This article explains the design, based on a working local MCP server that reads Markdown articles and exposes them through MCP tools.

What was verified

I verified the pattern with a local Markdown knowledge base and a stdio MCP server. From an MCP SDK client, the server exposed these tools:

list_articles
get_article
search_knowledge

Calling get_article returned article metadata and the full body as JSON. The body was Markdown, not HTML. In the verification run, the returned body did not contain rendered tags such as <h1>, <h2>, or <p>.

So the basic architecture works: one Markdown file can be stored in content/, then exposed to AI clients through an MCP server without converting it to HTML first.

Recommended architecture

The design is:

content/*.md
  -> MCP server: returns raw Markdown
  -> Next.js site: converts Markdown to HTML
  -> AI coding tool: reads Markdown directly

The MCP server reads Markdown files, parses frontmatter, and returns metadata plus the raw body. The website transforms Markdown into HTML using a Markdown processor such as remark. AI coding tools and editor integrations can consume the Markdown directly.

The key idea is to keep the MCP server as a knowledge delivery layer, not a rendering engine.

Why not return HTML from MCP?

An MCP server can return HTML. If a website is the only client, that may look convenient.

But MCP is usually not only a website backend. Claude Code and other AI coding tools can read Markdown more naturally than HTML. A mobile app or desktop app may want to render the same article as native components rather than as browser HTML.

If the MCP server returns HTML, it starts to inherit web presentation concerns:

Heading and paragraph rendering rules affect the MCP response.
CSS-oriented markup gets passed to clients that do not need CSS.
Non-web clients may need to reverse or reinterpret the HTML.
HTML usually costs more tokens than the original Markdown.
Every new presentation format pushes more logic into the server.

If the source of truth is Markdown, the MCP server should expose that source of truth.

A practical tool shape

A small Markdown knowledge server only needs a few tools:

list_articles()
get_article(slug, lang)
search_knowledge(query)

list_articles() returns metadata only. It is useful for article lists, search candidates, and navigation. It should not return every full body by default.

get_article(slug, lang) returns one article with metadata and the raw Markdown body.

search_knowledge(query) searches titles, descriptions, and body text, then returns matching metadata. The client can call get_article only for the article it actually needs.

The core article shape can stay small:

type Article = {
  slug: string;
  title: string;
  date: string;
  description: string;
  tags: string[];
  lang: "en" | "ja";
  body: string; // raw Markdown
};

function getArticle(slug: string, lang: "en" | "ja"): Article | null {
  const file = findMarkdownFileBySlug(slug, lang);
  if (!file) return null;

  const { data, content } = parseFrontmatter(file);

  return {
    slug: data.slug,
    title: data.title,
    date: data.date,
    description: data.description,
    tags: data.tags ?? [],
    lang: data.lang ?? "en",
    body: content,
  };
}

The website can handle HTML rendering separately:

import { remark } from "remark";
import html from "remark-html";

export async function renderArticleHtml(markdown: string): Promise<string> {
  const processed = await remark().use(html).process(markdown);
  return processed.toString();
}

With this split, the MCP server stays focused on reading, filtering, and returning knowledge. Each client owns its own presentation.

Treat frontmatter as the shared contract

For a Markdown knowledge base, frontmatter should be the contract shared by every consumer.

---
title: "Article title"
date: "2026-06-23"
slug: "article-slug"
description: "Short description for lists and metadata."
tags: ["mcp", "markdown"]
lang: "en"
---

Article body...

The website can use title for the page title and H1. The MCP server can use slug as the lookup key. description and tags can power lists and search results.

The important part is consistency. If the website hides draft: true articles but the MCP server returns them, you may leak unpublished content to an AI client. If the website and MCP server disagree on slug or lang, links and article lookups will drift.

For a shared Markdown source, keep these rules aligned:

Required fields: title, date, slug, description
draft: true is excluded from every public surface
lang identifies the article language
slug is used both for URLs and MCP lookup
tags use lowercase kebab-case strings

Why this works well with static sites

This design pairs well with a static site, including a Next.js app using output: export.

The website can convert Markdown to HTML at build time and deploy static files. The MCP server can read the same Markdown files locally or inside a developer tool. The public site does not need a database or runtime API just to serve articles.

With Next.js static export, there is no Node.js server at serving time. Article pages, RSS, sitemap, and robots output all need to be generated at build time. I cover those static-export issues in 5 Pitfalls of Next.js output: export and How to Avoid Them.

If next build works but next dev fails on CSS, that is a separate development-environment problem, not an MCP or Markdown architecture problem. See Why only next dev breaks CSS: it was NODE_ENV=production for that failure mode.

Security and publishing scope

An MCP server is a bridge from your local or private data into an AI client. That is useful, but it means the content boundary matters.

If Markdown articles contain secrets, internal URLs, personal data, API keys, customer names, or unreleased business details, the MCP server can pass those details into the model context.

At minimum:

Do not store secrets in content/.
Exclude draft: true in the MCP server too.
Make the content directory explicit, for example with CONTENT_DIR.
Resolve articles by slug rather than accepting arbitrary file paths.
Write logs to stderr, not stdout, because stdout is the MCP JSON-RPC channel.

The point is not to make MCP scary. The point is to treat "what can the AI read?" as part of the server design.

Summary

For a Markdown knowledge base, an MCP server should usually return raw Markdown and metadata, not rendered HTML.

The website can convert Markdown to HTML. AI coding tools can read Markdown directly. Other clients can choose their own rendering strategy. One Markdown source can serve all of them without forcing the MCP server to become a presentation layer.

The pattern was verified with a local stdio MCP server: list_articles, get_article, and search_knowledge were visible through an MCP client, and get_article returned the Markdown body rather than rendered HTML.

Keep the server thin. Return the source document safely. Let clients render it.