At first I thought this was for compression or something similar, but it turns out it is for legal reasons, to avoid copyright infringement on abstract plain text. Of course, it only takes a few lines of code to reconstruct the plain text from this inverted index, so I was surprised this would hold up in court.
I couldn’t find a definitive answer, but I believe the closest one is this comment on StackExchange: [2]
> It's possible that they could include the plain text abstracts legally, but some publication disagrees and they don't want to fight it in court. It's also possible that the publisher believes that the inverted index is indeed infringing their copyright, but they're not sufficiently confident that they would prevail in court to actually bring a suit. The ultimate answer to this question is that it's a close call, and until a court rules on it, nobody knows for sure whether it's illegal in any given jurisdiction. I don't know whether a court has ruled on it.
"abstract_inverted_index": { "To": [ 0 ], "determine": [ 1 ], "whether": [ 2, 154 ], "certain": [ 3 ], "computed": [ 4, 44 ], ... }
At first I thought this was for compression or something similar, but it turns out it is for legal reasons, to avoid copyright infringement on abstract plain text. Of course, it only takes a few lines of code to reconstruct the plain text from this inverted index, so I was surprised this would hold up in court.
I couldn’t find a definitive answer, but I believe the closest one is this comment on StackExchange: [2]
> It's possible that they could include the plain text abstracts legally, but some publication disagrees and they don't want to fight it in court. It's also possible that the publisher believes that the inverted index is indeed infringing their copyright, but they're not sufficiently confident that they would prevail in court to actually bring a suit. The ultimate answer to this question is that it's a close call, and until a court rules on it, nobody knows for sure whether it's illegal in any given jurisdiction. I don't know whether a court has ruled on it.
[1] https://en.wikipedia.org/wiki/OpenAlex
[2] https://law.stackexchange.com/questions/110313/how-does-inve...
⇒ I don’t see that defense hold up in any reasonable court. It does make the infringement harder to find, though.