Ask HN: Data integrity in a decentralized trustless system
I am working on a voting system for a local community, people will vote with their smartphone, tablet or computer. They previously receive a link to the vote form with an access key in the url.
Once someone has voted he cannot see is vote (like in real world polling station), which partly avoid the "cash for vote" problem. "Partly" because someone can still force someone else to vote like he wants. For this, my idea was to allow to vote multiple times but with idem-potency: only the first or the last vote is recorded in the database. "you can give me 5$ to vote like you want, once back home I vote for the one I want".
But since voters cannot check if their vote has been properly recorded they must trust the system.
Block chain that involves several parties (an NGO, an university, etc...) could be an idea but nothing prevent the API that receives the votes and insert records in the blockchain to insert fake data. Code audit (+ CI/CD audit + DNS records audit) can help.
My idea: each party (NGO 1, NGO 2, university 1, university 2, etc...) deploy the API and its database. The source code is obviously open.
Then, a proxy receives the HTTP request and forward them to each API. The goal is to reach eventual consistency.
Outages may occur, if API deployed at university 1 is unreachable some data will be missing in its database. A retry policy at the proxy level can help but only for short time network failure, not for several minutes or hours outage.
So differences between databases at the end of the election will probably happen and should be corrected. If consensus is met on chunks of data (example: 2/3 of the databases have the same data for each 1 hour period) then we can get reach eventual consistency.
The proxy becomes the weak link. Each party should must have access to its configuration for audit purpose and must also have access to the DNS records.
At the ends the voters don't have to understand all these details, they trust the system because they trust the parties who participates.
What do you think ?
Thanks !
As to your main question re the proxy: Why is there a proxy in the first place? The client could just make the same requests the proxy would directly to the multiple parties, obviating the for it, no?
Without the proxy the client would request a specific instance of the API (university 1, university 2, NGO 1, NGO 2), which then would be responsible for forwarding the data to the other. What if he changes the code and forward a false data:
user A votes for candidate X (HTTP POST request received by API deployed at university 1). API deployed at university A is compromised (by the university itself or not) and the information persisted in the DB is "user A votes for candidate Y". This information is then forwarded to other API.
If a proxy like NGINX is responsible for request forwarding the problem is solved (assuming that all parties trust nginx and its "mirror" module). https://nginx.org/en/docs/http/ngx_http_mirror_module.html