Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

While building my startup i kept running into the issue where ai agents in cursor create endpoints or code that shouldn't exist, hallucinates strings, or just don't understand the code.

ask-human-mcp pauses your agent whenever it’s stuck, logs a question into ask_human.md in your root directory with answer: PENDING, and then resumes as soon as you fill in the correct answer.

the pain:

your agent screams out an endpoint that never existed it makes confident assumptions and you spend hours debugging false leads

the fix:

ask-human-mcp gives your agent an escape hatch. when it’s unsure, it calls ask_human(), writes a question into ask_human.md, and waits. you swap answer: PENDING for the real answer and it keeps going.

some features:

- zero config: pip install ask-human-mcp + one line in .cursor/mcp.json → boom, you’re live - cross-platform: works on macOS, Linux, and Windows—no extra servers or webhooks. - markdown Q\&A: agent calls await ask_human(), question lands in ask_human.md with answer: PENDING. you write the answer, agent picks back up - file locking & rotation: prevents corrupt files, limits pending questions, auto-rotates when ask_human.md hits ~50 MB

the quickstart

pip install ask-human-mcp ask-human-mcp --help

add to .cursor/mcp.json and restart: { "mcpServers": { "ask-human": { "command": "ask-human-mcp" } } }

now any call like:

answer = await ask_human( "which auth endpoint do we use?", "building login form in auth.js" )

creates:

### Q8c4f1e2a ts: 2025-01-15 14:30 q: which auth endpoint do we use? ctx: building login form in auth.js answer: PENDING

just replace answer: PENDING with the real endpoint (e.g., `POST /api/v2/auth/login`) and your agent continues.

link:

github -> https://github.com/Masony817/ask-human-mcp

feedback:

I'm Mason a 19yo solo-founder at Kallro. Happy to hear any bugs, feature requests, or weird edge cases you uncover - drop a comment or open an issue! buy me a coffee -> coff.ee/masonyarbrough

Comments (34)

superb_dev · 5h ago

This site is impossible to read on my phone. Part of the left side of the screen is cut off and I can’t scroll it into view

rfl890 · 2h ago

Switching to desktop mode fixed it for me

tyzoid · 3h ago

Completely blank for me on mobile (javascript disabled)

kbouck · 2h ago

Rotate phone to landscape

lobsterthief · 4h ago

Same here

banner520 · 3h ago

I also have this problem on my phone

loloquwowndueo · 6h ago

- someone sets up an “ask human as a service mcp” - demand quickly outstrips offer of humans willing to help bots - someone else hooks up AI to the “ask human saas” - we now have a full loop of machines asking machines

olalonde · 41m ago

I built this - but mostly as a joke / proof-of-concept: https://github.com/olalonde/mcp-human

TZubiri · 5h ago

This is pretty much already possible in any economy, but quite a waste.

Not much is stopping you from buying products from a retailer and selling them at a wholesaler, but you'd lose money in doing so.

exclipy · 1h ago

Would be great if it pinged me on slack or whatsapp. I wouldn't notice if it simply paused waiting for the MCP call to return

mgraczyk · 5h ago

If you are answering these questions yourself, why not just add something like this to your cursor rules?

"If you don't know the answer to a question and need the answer to continue, ask me before continuing"

Will you have some other person answer the question?

bckr · 4h ago

I’ve tried putting “stop and ask for help” in prompts/rules and it seems like Cursor + Claude, up to 3.7, is highly aligned against asking for help.

No comments yet

ramesh31 · 2h ago

>If you are answering these questions yourself, why not just add something like this to your cursor rules?

What you are asking for is AGI. We still need human in the loop for now.

mgraczyk · 1h ago

What I'm describing is a human in the loop. It's just a different UX, one that is easier to use and closer to what the model is trained to use.

deadbabe · 5h ago

Having another person answer the question is pretty much the obvious route this will go.

mgraczyk · 5h ago

But then that means they are editing a markdown file on your computer? How is that meant to work?

I like the idea but would rather it use Slack or something if it's meant to ask anyone.

echollama · 2h ago

this is mainly meant as a way to conversate with the model while you are programming with it. This is not meant to pull questions to a team but more to pair program. a markdown file is best for syntax in an llm prompt and also just easiest to have open and answer questions with. If i had more time and could i would build an extension into cursor.

mgraczyk · 1h ago

Why not have the model ask in the chat? It's a lot easier to just talk to it than open a file. The article mentions cursor so it sounds like you're already using cursor?

echollama · 3m ago

would probably work better, this is just how i threw it together as an internal tool a long time ago. i just improved it and shipped it to opensource it.

kjhughes · 5h ago

Cool conceptually, but how exactly does the agent know when it's unsure or stuck?

Groxx · 5h ago

The same way it knows anything else.

So not at all, but that doesn't mean it's not useful.

kjhughes · 5h ago

I'll try to give you credit for more than dismissing my question off-hand...

Yes, it may not need to know with perfect certainty when it's unsure or stuck, but even to meet a lower bar of usefulness, it'll need at least an approximate means of determining that its knowledge is inadequate. To purport to help with the hallucination problem requires no less.

To make the issue a bit more clear, here are some candidate components to a stuck() predicate:

- possibilities considered

- time taken

- tokens consumed/generated (vs expected? vs static limit? vs dynamic limit?)

If the unsure/stuck determination is defined via more qualitative prompting, what's the prompt? How well has it worked?

Groxx · 4h ago

I don't believe[1] any of those are part of the MCP protocol - it's essentially "the LLM decided to call it, with X arguments, and will interpret the results however it likes". It's an escape hatch for the LLM to use to do stuff like read a file, not a monitoring system that acts independently and has control over the LLM itself.

(But you could build one that does this, and ask the LLM to call it and give your MCP that data... when it feels like it)

So you'd be using this by telling the LLM to run it when it thinks it's stuck. Or needs human input.

1: I am not anything even approaching deeply knowledgeable about MCP, so please, someone correct me if I'm wrong! There do seem to be some bi-directional messaging abilities, e.g. notification, but to figure out thinking time / token use / etc you would need to have access to the infrastructure running the LLM, e.g. Cursor itself or something.

threeseed · 2h ago

You are trying to control a system that is inherently chaotic.

You can probably get some where by indeed running a task 1000 times and looking for outliers in the execution time or token count. But that is of minimal use and anything more advanced than that is akin to water divining.

TZubiri · 5h ago

So we are just pushing the issue to another, less debuggable layer. Cool.

echollama · 2h ago

the reasoning aspect of most llms these days knows when its unsure or stuck, you can get that from its thinking tokens. It will see this mcp and call it when its in that state. Though this could benefit from some rules file to use it, although cursor doesn't quite follow ask for help rules, hence making this.

kjhughes · 1h ago

Does all thinking end up getting replaced by calls to Ask-human-mcp then? Or only thinking that exceeds some limit (and how do you express that limit)?

threeseed · 2h ago

> an mcp server that lets the agent raise its hand instead of hallucinating

a) It doesn't know when it's hallucinating.

b) It can't provide you with any accurate confidence score for any answer.

c) Your library is still useful but any claim that you can make solutions more robust is a lie. Probably good enough to get into YC / raise VC though.

echollama · 2h ago

reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question.

this is a streamlined implementation of a interanlly scrapped together tool that i decided to open-source for people to either us or build off of.

threeseed · 19m ago

> reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question

You've just described AGI.

If this were possible you could create an MCP server that has a continually updated list of FAQ of everything that the model doesn't know.

Over time it would learn everything.

geraneum · 1h ago

> reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question.

I’m interested. Where can I read more about this?

rgbrenner · 6h ago

Sounds similar to `ask_followup_question` in Roo

conception · 6h ago

What sort of prompt are you using for this?

throwaway314155 · 6h ago

Not certain that your definition of hallucination matches mine precisely. Having said that, this is so simple yet kinda brilliant. Surprised it's not a more popular concept already.

Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

Tokasaurus: An LLM inference engine for high-throughput workloads (scalingintelligence.stanford.edu)

The impossible predicament of the death newts (crookedtimber.org)

How we’re responding to The NYT’s data demands in order to protect user privacy (openai.com)

Test Postgres in Python Like SQLite (github.com)

Show HN: Claude Composer (github.com)

What a developer needs to know about SCIM (tesseral.com)

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations (masonyarbrough.com)

APL Interpreter – An implementation of APL, written in Haskell (2024) (scharenbroch.dev)

Defending adverbs exuberantly if conditionally (countercraft.substack.com)

Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Seven Days at the Bin Store (defector.com)

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software (rtl-sdr.com)

X changes its terms to bar training of AI models using its content (techcrunch.com)

Machine Learning: The Native Language of Biology (decodingbiology.substack.com)

Show HN: Lambduck, a Functional Programming Brainfuck (imjakingit.github.io)

A proposal to restrict sites from accessing a users’ local network (github.com)

Open Source Distilling (opensourcedistilling.com)

I do not remember my life and it's fine (aethermug.com)

I made a search engine worse than Elasticsearch (2024) (softwaredoug.com)

The Universal Tech Tree (asteriskmag.com)

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

Switch 2 rooted on day 1 (bsky.app)

Twenty Years of TiddlyWiki (2024) (tiddlywiki.com)

Programming language Dino and its implementation (github.com)

Show HN: iOS Screen Time from a REST API (thescreentimenetwork.com)

Eleven v3 (elevenlabs.io)

Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX (github.com)

Building an AI Server on a Budget (informationga.in)

How Common Is Multiple Invention? (construction-physics.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

Show HN: Container Use for Agents (github.com)

Apple Notes Will Gain Markdown Export at WWDC, and, I Have Thoughts (daringfireball.net)

Show HN: String Flux – Simplify everyday string transformations for developers (stringflux.io)

parrot.live (github.com)

LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)

Phptop: Simple PHP ressource profiler, safe and useful for production sites (github.com)

From tokens to thoughts: How LLMs and humans trade compression for meaning (arxiv.org)

End of an Era: Landsat 7 Decommissioned After 25 Years of Earth Observation (usgs.gov)

Twitter's new encrypted DMs aren't better than the old ones (mjg59.dreamwidth.org)

Prompt engineering playbook for programmers (addyo.substack.com)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Rare black iceberg spotted off Labrador coast could be 100k years old (cbc.ca)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

Cursor 1.0 (cursor.com)

Anthropic co-founder on cutting access to Windsurf (techcrunch.com)

A Spiral Structure in the Inner Oort Cloud (iopscience.iop.org)

Understanding the PURL Specification (Package URL) (fossa.com)

Aurora, a foundation model for the Earth system (nytimes.com)

Data centers are building their own gas power plants in Texas (texastribune.org)

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

Comments (34)