This is a difficult project. The blog post seems to hint at reasonable feasibility, this stuff is hard! We build a less ambitious tool in the university lab: "ASTANA: Practical String Deobfuscation for Android Applications Using Program Slicing" [0].
Would advise to first read the reverse engineering related work. Genetic programming is just a technique best used when everything else has failed :-)
Thanks for the hints. Of course, it's very very difficult. But one thing I think you missed, is that I'm proposing a "byte equivalent decompilation". And after that, we should go into reading the code readable and understandable.
If we could create a program doing all this, automatically or semi-automatically, it will be great-great because then not releasing the kernel code doesn't matter. I believe if enough effort and time is put into it, there is a good chance we could see such a thing in like 5-7 years.
After that, we might be able to target the binary blobs, the propriety firmwares. Those might have some legal issue, of course. But as long as it is used only to write a FOSS alternative, that probably won't be an issue, I think.
temp0826 · 4h ago
Not exactly the same goal but in the ballpark, reminded me of the linux-libre patches (removes binary blobs and non-gpl code from the kernel)-
> Theoretically, we could also go for finding the semantically equivalent C code. However, last time I researched, checking semantic equivalency is a very complex problem. I think it was NP hard.
Already deciding whether two finite automata decide the same (regular) language is PSPACE-complete; it's undecidable for anything that can decide arbitrary context-free languages (which C programs can clearly do).
farooqkz · 1h ago
Well I knew that checking if two binary circuits are equivalent is NP hard. Checking semantic equivalency of C code, of course, should be harder.
ACCount37 · 3h ago
I thought this would be an actual process, not a blog post with idle musings.
farooqkz · 1h ago
I wanted to append [idea] but it became too long. I think there should be balance "musings" and actual work. I learned very valuable lessons in my wakegp research. Even though I haven't completed it. If the one doing this research would be me, I would go for no actual work as long as I can. And when I start to do the work, I should do as little as I can and move very slowly. Like the saying "When you move slowly, you can see the path ahead clear". After all, we should know if the next step we go forward, the ground under us is solid.
ACCount37 · 53m ago
That sounds like a load of pseudo-profound bullshit to me.
And this little idea you've described? It's exactly the kind of thing where both the fundamental issues and the specific implementation details would conspire to fuck you over hard at every single turn.
If you actually tried to implement that idea of yours, you'd fail a lot, and maybe you'd learn something, and then it would be worth writing a post about that. About what you tried, what didn't work, what worked a little bit but not really.
As is? You picked a domain where ideas are easy but implementation is hilariously hard, and then you had an idea but implemented nothing. You never tried anything and you never learned anything. The value is nil.
ginko · 4h ago
I'm having a really hard time parsing this title.
rnhmjoj · 3h ago
A dash between GPL and violated would probably help.
farooqkz · 1h ago
Thanks for the feedback. I don't think I can edit the HN entry, but I'm updating me blog post.
Would advise to first read the reverse engineering related work. Genetic programming is just a technique best used when everything else has failed :-)
[0] https://arxiv.org/pdf/2104.02612
If we could create a program doing all this, automatically or semi-automatically, it will be great-great because then not releasing the kernel code doesn't matter. I believe if enough effort and time is put into it, there is a good chance we could see such a thing in like 5-7 years.
After that, we might be able to target the binary blobs, the propriety firmwares. Those might have some legal issue, of course. But as long as it is used only to write a FOSS alternative, that probably won't be an issue, I think.
https://www.fsfla.org/ikiwiki/selibre/linux-libre/
Already deciding whether two finite automata decide the same (regular) language is PSPACE-complete; it's undecidable for anything that can decide arbitrary context-free languages (which C programs can clearly do).
And this little idea you've described? It's exactly the kind of thing where both the fundamental issues and the specific implementation details would conspire to fuck you over hard at every single turn.
If you actually tried to implement that idea of yours, you'd fail a lot, and maybe you'd learn something, and then it would be worth writing a post about that. About what you tried, what didn't work, what worked a little bit but not really.
As is? You picked a domain where ideas are easy but implementation is hilariously hard, and then you had an idea but implemented nothing. You never tried anything and you never learned anything. The value is nil.