Morphology of a Marvel Movie

3 higuidebot 3 5/19/2025, 2:58:30 PM github.com ↗

Comments (3)

PaulHoule · 3h ago
Cosine similarity works for this but the right way to think about it is as a classical ML classification problem with all the tools from

https://scikit-learn.org/stable/supervised_learning.html

For instance you will probably get better results with SVM or a not-so-deep perceptron or maybe random forest model than you will with cosine similarity. You can also probability calibrate such a model

https://scikit-learn.org/stable/modules/calibration.html

which is quite useful.

higuidebot · 2h ago
What do you think a "better" result would be here? Better by what metric?
PaulHoule · 2h ago
Accuracy.

If you got N people (say N=10) to classify different segments of the script you'd find that they'd mostly agree about how to classify them but they wouldn't agree perfectly. You can get closer to a "gold truth" if you sit people together to discuss the difficult cases.

Any given classifer is going to be like one individual, if it is any good it is going to mostly agree with the gold truth but sometimes it won't. It's also the truth that some classifications will be ambiguous as some segment of the script will have some characteristics of one class and some of another or just might not fit rationally into the schema.

This toolbox

https://scikit-learn.org/stable/model_selection.html

is helpful for the process of testing a number of different models for a range of parameters and deciding what works best. A classifier that is calibrated (returns a probability of class membership) can skip cases where it knows it doesn't know what it is talking about. In the financial world, a calibrated model + a Kelly better can make money trading, an uncalibrated model will lose money almost always.