Adventures in Symbolic Algebra with Model Context Protocol

I spent last weekend playing with this new MCP protocol all the kids are talking about. And it's fun, but a bit early and rough around the edges.

MCP, if you're not familiar, is Anthropic's answer to the question: "How do we get AI to actually DO things instead of just TALK about doing them?" It's a protocol that allows language models to call external tools, much like how your friend who claims to know everything actually calls their more knowledgeable friend behind your back. The beauty of MCP lies in its standardization instead of writing custom connectors for each AI model and each tool, you implement the protocol once on each side. It's the USB-C of AI tooling, if USB-C were still in its awkward adolescent phase.

Notably the MCP server runs locally on your machine letting you the language model invoke arbitrary code and commands by using one of the LLM desktop clients to call out to the local server. This is probably as dangerous and reckless as you might think, so there's a definite security concern here, be warned. But let's not let that get in the way of a fun experiment.

My particular itch stemmed from watching Claude (and its cousins like o4-mini-high and DeepSeek-R1) really struggle with tensor calculus. If you've ever asked an LLM to perform complex symbolic manipulation, you know the drill: confident answers, beautiful LaTeX formatting, and results that would make your math professor weep bitter tears. These models, despite their impressive linguistic capabilities, are absolute disasters when it comes to keeping track of indices in tensor expressions or manipulating complicated algebraic forms. The expressions involved in even moderate general relativity problems are HUGE, with hundreds of terms and complex index gymnastics.

But we already have specialized tools that excel at this! Computer algebra systems like Mathematica, Sympy, Cadabra, and EinsteinPy were built specifically for this purpose. So the obvious solution presented itself: let's expose these tools to the LLM through MCP and let each system do what it does best. The LLM handles the natural language understanding and planning, while the symbolic algebra system performs the actual mathematical manipulations with perfect precision.

Working with the MCP ecosystem is like visiting a frontier town in the Wild West. The documentation exists in the form of scattered campfire stories, the implementations have a distinctly "I wrote this at 3 AM hackathon" vibe, and everything is strangely Node-heavy. This Node fixation likely stems from most MCP tools being designed to call REST services for cloud applications. Then there's the peculiar ecosystem of suspiciously self-referential products from companies that just happen to sell AI coding assistants. There's definitely a faint whiff of opportunism in the air.

Debugging an MCP server is a crazy exercise. You're essentially working with a stochastic black box that communicates through a complex web of JSON schemas attached to docstring annotations. When something goes wrong, good luck figuring out if it's your server, the client, the LLM's interpretation, or just the model having a laugh at your expense. The non-deterministic nature of the whole setup means that something can work perfectly five times in a row and then spontaneously fail on the sixth attempt for reasons that remain shrouded in mystery.

The basic implementation, however, is refreshingly straightforward, similar to FastAPI if FastAPI were designing its endpoints for a language model. Here's a simple example that highlights why this approach matters: Ask any LLM to factor a large integer, and watch it confidently fabricate entirely wrong answers. By design, transformers can't perform the arbitrary computation required for integer factorization. They've merely memorized some factorizations from the internet (and even those, poorly).

from mcp.server.fastmcp import FastMCP
import subprocess

mcp = FastMCP("Demo", instructions="You factor integers.")

def factor_number(number):
    result = subprocess.run(['factor', str(number)], capture_output=True, text=True)
    return result.stdout.strip()

@mcp.tool()
def factor(a: int) -> int:
    """Factor an integer"""
    return factor_number(a)

With this simple tool, we can now expose the actual computation to the LLM using the GNU factor shell command. Instead of making up factors, the model recognizes when to delegate the task to a specialized tool.

When everything aligns, when the JSON is properly formatted, when the LLM correctly identifies which tool to use, when the server processes the request without hiccups, the result is genuinely magical. It's early days for this technology, but the potential is intoxicating. Imagine the synthesis of LLMs with theorem provers like Lean, or computer algebra systems becoming accessible through natural language. This is pretty cool, albeit a very open problem.

Let me show you how this looks in practice with a classic example from physics: solving the damped harmonic oscillator with a forcing term. This is the bread and butter of undergraduate physics, described by:

$$
m\frac{d^2x}{dt^2} + c\frac{dx}{dt} + kx = F(t)
$$

Where m is mass, c is the damping coefficient, k is the spring constant, and F(t) is an external force.

Here's how the workflow unfolds with our sympy-mcp server:

Assistant: (Internal tool call) intro_many to introduce variables with appropriate assumptions:

{
  "variables": [
    {"var_name": "t", "pos_assumptions": ["real"], "neg_assumptions": []},
    {"var_name": "m", "pos_assumptions": ["real", "positive"], "neg_assumptions": []},
    {"var_name": "c", "pos_assumptions": ["real", "positive"], "neg_assumptions": []},
    {"var_name": "k", "pos_assumptions": ["real", "positive"], "neg_assumptions": []}
  ]
}

Assistant: (Internal tool call) introduce_function to define x(t) as a function:

{
  "func_name": "x"
}

Assistant: (Internal tool call) introduce_expression to create the differential equation:

{
  "expr_str": "Eq(m*Derivative(x(t), t, t) + c*Derivative(x(t), t) + k*x(t), F(t))"
}

Assistant: (Internal tool call) introduce_function to define the forcing function F(t):

{
  "func_name": "F"
}

Assistant: (Internal tool call) dsolve_ode to solve the ODE:

{
  "expr_key": "expr_12",
  "func_name": "x"
}

And voilĂ ! Instead of hallucinating a solution, the LLM delegates to SymPy which returns the correct solution:

$$
x{\left(t \right)} = C_{1} e^{\frac{t \left(- c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} + C_{2} e^{- \frac{t \left(c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} + \frac{e^{\frac{t \left(- c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} \int F{\left(t \right)} e^{\frac{c t}{2 m}} e^{- \frac{t \sqrt{c^{2} - 4 k m}}{2 m}}, dt}{\sqrt{c^{2} - 4 k m}} - \frac{e^{- \frac{t \left(c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} \int F{\left(t \right)} e^{\frac{c t}{2 m}} e^{\frac{t \sqrt{c^{2} - 4 k m}}{2 m}}, dt}{\sqrt{c^{2} - 4 k m}}
$$

No hallucinated terms, no mysterious constants appearing out of nowhere, just a precise solution. The LLM handles the natural language interaction and orchestration, while SymPy does what it does best ... exact symbolic manipulation.

Anyways all the code is up on Github here so maybe someone else will find it useful. If you have Cursor or Claude installed add the following to ~/.cursor/mcp.json or ~/Library/Application Support/Claude/claude_desktop_config.json to install the MCP server.

{
  "mcpServers": {
    "sympy-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "https://github.com/sdiehl/sympy-mcp/releases/download/0.1/sympy_mcp-0.1.0-py3-none-any.whl",
        "python",
        "server.py"
      ]
    }
  }
}

Or maybe slightly better, run it from a Docker image.

{
  "mcpServers": {
    "sympy-mcp": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "-p",
        "8081:8081",
        "--rm",
        "ghcr.io/sdiehl/sympy-mcp:latest"
      ]
    }
  }
}

And maybe read the source code for what you're installing here, because I feel like a lot of people are going to install these MCP servers without knowing what they're doing and thus install a lot of malware and exploits. There's basically no security going on here. And that could be a big problem. Tread carefully.