Tom Johnell

MCP from the User’s Lens

I’ve been reading a lot about MCP including the full specification to really get down to the “bare metal.” Across my home, I hear my wife on a Zoom call listening to another person, presumably someone technical, trying to explain what MCP is to her. She’s very sharp by the way, but the language we use is a bit rough, and I think, collectively, folks are confused.

Also, Model Context Protocol isn’t the most intuitive or compelling name. No offense to Anthropic, but there’s definitely an opportunity here for stronger branding.

First, one thing to be clear on is, the only thing really specific to actual LLMs in this protocol is something called “prompts” & “sampling” that I will maybe talk about another time. Otherwise, this protocol is very much agnostic to LLMs, models, etc.

Here goes my analogy. In thinking through what is a product experience everyone uses that brings together many disparate platforms without users even having to think about it, I landed on one we all likely use every day: The “Share” button on your phone.

It’s actually quite magical, no? You click share on a photo, a list of popular apps show up, you click an app, and instantly you’re able to use that photo in the context of that application. Have you ever stopped to wonder how that works? No, because it’s so seamless; there’s no configuration. After you select “Share,” you, the user, are led down a series of steps that are defined by each app. The apps are in control, but the phone provides the context.

Now, sharing is a very rigid operation. It typically only allows for one type of action—transferring content from one app to another—without any customization or contextual nuance. It really only means one thing. That’s why, through a simple interface, nearly every human is able to navigate the options.

What if, instead, the applications could define what actions you could take on a photo, and those options appeared natively within your phone? Your UX designer’s head might explode. If every application can alter the UX, the client will be LITTERED with esoteric options. Related note: How many people actually hold down an app tile on their home screen to access the context menu? Nobody. Out of sight, out of mind.

Most products, until today, have not been able to allow the applications to define the UX. It would be unmanageable design-wise and it would confuse the hell out of your customers. There’s simply too many options.

Here’s where things get interesting. I recall a podcast I listened to several years ago about the power of voice as an interface - I believe this was in relation to Alexa. For example, with most user interfaces, the ability to filter based on specific criteria is relatively constrained. Suddenly, with the introduction of voice, folks started asking questions like “Play the top bangers from 2007.” It’s a perfectly valid request and it’s not too difficult to program, but providing a visual interface that enables that level of filtering would be quite difficult. Of course, voice has its drawbacks - any time you want to search or list something, you have Alexa listing off every single item and you quickly say “SHUT UP! I’ll just use my phone!”

There’s a paradigm shift that’s about to occur, I believe. What if you could have both? What if every application could influence the user interface, but it was done in such a way that applications are not muscling for the top position, but rather there’s a referee that has the full context of the customer’s needs and displays the best options for them? I believe that’s the future of AI-driven UX.

That’s just the surface though. That referee isn’t just deciding what to show, it knows how to use the capabilities of all of the applications (or tools) involved and can achieve outcomes that require complex interactions across the ecosystem of applications.

Let’s ground this in a concrete example, and I’m going to use Datadog given that’s where my wife, Kay, works. Let’s say my company has decided to migrate from the Grafana stack to Datadog (generally it’s the opposite, but hey, I’m rootin’ for the big dog). The first thing anyone is going to do is Google “Migrate Grafana to Datadog.” I just googled it and of course there’s tools, blogs, etc. That’s not the future though - it’s too much work. I don’t want to search anymore.

Instead, we’ll operate in the future world of tool-driven design via MCP. Imagine I’m able to define my context to the browser (or the website, Grafana). e.g. “I am working on migrating Grafana dashboards to Datadog.” Instead of a “Share” button, there is instead an “MCP” button - maybe it will be the new ✨ icon that is trendy right now. Let’s suppose that button is displayed at the top of a visualization. Upon clicking it, several options appear: “Copy to Datadog,” “Explore in Datadog.” The options or tools that appear are in context to my needs as a user. 

Upon clicking the option - maybe there is more information that is necessary to complete the action. Similar to the applications taking control after you share a photo on your phone that guide you through the steps, tool-driven design will need to provide some open standard to gather information from you that is consistent and seamless. Maybe a browser modal opens up that asks for you to specify what account within Datadog given you have multiple configured. The point is, no one has defined that UX. Instead, the referee managing all of these applications understands the needs of each tool—because each tool exposes its capabilities through MCP as a standardized schema. That schema tells the system exactly what the tool can do, what information it needs from you, and what to expect in return. From there, the AI can guide you through the right flow to achieve your goal.

Something like the above is already possible in a chat-based user interface, but I don’t believe, overnight, the entire internet will convert over to voice and chat as an interface. We are visual creatures.

I think I’m beginning to ramble, so I’m going to wrap this up. MCP is a protocol. That’s meaningless to most folks, so you’ve already lost them. Reframe in the context of a user with needs. The user is able to provide their context and be guided through the steps and tools necessary to accomplish their goals. In visual interfaces, the user is presented options that are contextual to their specific needs. My example with the MCP button is not very brave. This can be taken to an even more extreme level where the entire user interface is determined and laid out by your referee, the AI, based on your needs. I imagine we’ll start seeing something of that nature in the next year or so.

Subscribe to my blog


#ai #mcp #product