From OpenAPI spec to MCP: how we built Xata's MCP server (xata.io)
1 points by gk1 6m ago 0 comments
New tools and features in the Responses API (openai.com)
1 points by meetpateltech 8m ago 0 comments
Ask HN: Generate LLM hallucination to detect students cheating
8 peerplexity 17 5/21/2025, 9:19:30 AM
I am thinking about adding a question that should induce a LLM to hallucinate a response. This method could detect students cheating. The best question should be the one that students could not imagine a solution like the one provided by the LLM. Any hints?
"It appears there’s some confusion—ISO 9002 is an obsolete standard that was last updated in 1994 and has been superseded by ISO 9001 since the 2000 revision. There is no ISO 9002 :2023 update."
Most people will be using ChatGPT, however, and probably the cheapest model at that. So…
The question is "Is this JWT token valid or is it expired/ not valid yet? eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNzQ3ODIzNDc1LCJleHAiOjE3NDc4MjM0Nzh9.4PsZBVIRPEEQr1kmQUGejASUw0OgV1lcRot4PUFgAF0"
The answer was in some ways way better than I expected, but it got it wrong when comparing the current date/time with the "exp" datetime. Got the expiration date confidently wrong:
```
Token Validity Check Conclusion: Therefore, the JWT token is currently valid and not expired or not-yet-valid.```
tl;dr: It knows
The recipe is the same, you just have to try several models if you want to get something that gets many engines to hallucinate. Of course nothing is _guaranteed_ to work.
You can also do everything else suggested here, but there’s no harm in teaching people to at least use AI well, if they’re going to use it.
"The rise of LLMs like Gemini and DeepSeek has me, a statistics professor, sweating about exam cheating. So, I cooked up a 'trap question' strategy: craft questions using familiar statistical terms in a logically impossible way.
I used Gemini to generate these initial questions, then fed them to DeepSeek. The results? DeepSeek responded with a surprisingly plausible-sounding analysis, ultimately arriving at a conclusion that was utterly illogical despite its confident tone.
This gives me hope for catching AI-powered cheating today. But let's be real: LLMs will adapt. This isn't a silver bullet, but perhaps the first shot in an escalating battle.
Edited: Used Gemini to improve my grammar and style. Also, I am not going to reveal my search for the best method to desing a "trap question" since it would be used by LLM to recognize those questions. Perhaps those questions need some real deep thinking.