Although the Internet Archive’s Wayback Machine is a great research tool, its utility is hampered but a lack of basic search mechanisms. One can search by URL and archived links, but basic Google-style boolean searching isn’t available. The Archive once offered a beta boolean search tool, but it never worked and it was later withdrawn.
However, a new application may significantly expand our ability to data-mine archived webdata. Reports give a sneak peek at Zoetrope, an application being developed by researchers at Adobe and the University of Washington. As put by the researchers:
The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historical Web (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar [sic] pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.
The demo video shows how historical webdata could be manipulated and compared, as the authors note, in a variety of “novel” ways. Even more significantly, researcher Eytan Adar “hopes to eventually incorporate information from the Internet Archive’s nearly 14 years of records.” Such a combination would massively increase the utility of web archives, but would also — as discussed in a paper I’m writing — exacerbate concerns over informational autonomy.
The research paper can be found here.