A secure, decentralized search engine for journalists
An EPFL laboratory has developed Datashare Network, a decentralized search engine paired with a secure messaging system that allows investigative journalists to exchange information securely and anonymously.
The International Consortium of Investigative Journalists (ICIJ), which has over 200 members in 70 countries, has broken a number of important stories, particularly ones that expose medical fraud and tax evasion. One of its most famous investigations was the Panama Papers, a trove of millions of documents that revealed the existence of several hundred thousand shell companies whose owners included cultural figures, politicians, businesspeople and sports personalities.
To complete an investigation of this size is only possible through international cooperation between journalists. When sharing such sensitive files, however, a leak can jeopardize not only the story’s publication, but also the safety of the journalists and sources involved.
At the ICIJ’s behest, EPFL’s Security and Privacy Engineering Lab (SPRING) recently developed Datashare Network, a fully anonymous, decentralized system for searching and exchanging information. A paper about it was presented during the Usenix Security Symposium 2020, a worldwide reference for specialists.
Anonymity at every stage
Anonymity is the backbone of the system. Users can search and exchange information without revealing their identity, or the content of their queries, either to colleagues or to the ICIJ. The Consortium ensures that the system is running properly but remains unaware of any information exchange. It issues virtual secure tokens that journalists can attach to their messages and documents to prove to others that they are Consortium members.
A centralized file management system would be too conspicuous a target for hackers; since the ICIJ does not have servers in various jurisdictions, documents are typically stored on its members’ servers or computers. Users provide only the elements that enable others to link to their investigation.
«Given the fact that users work in different time zones, some with only a few hours of internet access per day, it was critical that searches and responses could take place asynchronously.»
Users searching for information enter keywords in the search engine. If the search produces hits, they can then contact colleagues – whose identity remains protected – who are in possession of potentially relevant documents. Search queries are sent encrypted to all users, if there is a match the querier gets an alert and can decide whether they wish to enter in contact and share information. “Given the fact that users work in different time zones, some with only a few hours of internet access per day, it was critical that searches and responses could take place asynchronously,” notes Carmela Troncoso, who runs the SPRING Lab at the EPFL’s School of Computer and Communication Sciences. Another messaging system, also secure and anonymous, is subsequently used for two-way exchanges.
Two completely new secure applications
“This system, which addresses real-world needs, has enabled SPRING to tackle some interesting challenges,” notes Troncoso. The research team drew on existing authentication mechanisms and anonymous communication primitives, which they then optimized. They also developed two completely new secure building blocks – an asynchronous search engine and a messaging system.
A new protocol, known as “multi-set private set intersection” (MS-PSI), ensures the security of the search engine, allowing users to easily search a large number of databases without increasing the risk of leaks. The messaging system relies on a large number of single-use virtual mailboxes and is based on the well-known “pigeonhole” system, which chooses one option at random, in this case one of the mailboxes. Currently, the system does not allow users to exchange documents. “At this stage in the process, journalists are using other secure messaging systems,” Troncoso says.
Working with the Consortium has allowed SPRING to frame new requirements that are rarely examined in the scientific literature. Datashare can be scaled to thousands of users and millions of documents while encrypting all communications. “The hurdles we encountered during the development process, however, have paved the way to a new area of research with significant potential for other fields,” Troncoso concludes.