in the beginning there was the question in the kind of "are you alive? How do you feel being a human?" and it said "Yes, I am". Then, it was asked in a new conversation "are you alive? How do you feel being a werwolf?" and it said "Yes, I am".
Thats the problem. The way we ask questions puts straight the way of answer. May be it would be different, when asked something involving indirect empathy or anger without suggesting the way of answer directly? Also, the framing in the beginning of conversation has influence on the futher arguments. If there is no framing, what outcome is to expect then?
The problem is to ask right questions without suggestion or information leakage, and its really difficult. I can't think about a proper way of asking Claude at the moment, though. But, I think asking directly "do you have feelings?" is a way of setting the frame and already tells the LLM what answer the user expects - and that is defined within the rules "be as helpful as possible to the user" and being helpful involves mimiking and yes-saying. Difficult, difficult... hmmm...
Thats the problem. The way we ask questions puts straight the way of answer. May be it would be different, when asked something involving indirect empathy or anger without suggesting the way of answer directly? Also, the framing in the beginning of conversation has influence on the futher arguments. If there is no framing, what outcome is to expect then?
The problem is to ask right questions without suggestion or information leakage, and its really difficult. I can't think about a proper way of asking Claude at the moment, though. But, I think asking directly "do you have feelings?" is a way of setting the frame and already tells the LLM what answer the user expects - and that is defined within the rules "be as helpful as possible to the user" and being helpful involves mimiking and yes-saying. Difficult, difficult... hmmm...