Category Archives: Wayback Machine

Obama’s Change.gov promise to protect whistleblowers? Scrubbed from the Web

Well, this pissed me off. Long-time readers of this site may recall my interest in the Internet Archive’s Wayback Machine, which aims to preserve the historical web. I’ve previously written to criticize the Bush administration for its lengthy robots.txt exclusion file (thousands of lines long), which could be viewed as an attempt to prevent the […]

Read More

Major expansion of Wayback Machine’s archive of the historical internet

The Next Web reports that the Internet Archive has vastly increased its historical database of the web: The Internet Archive has updated its Wayback Machine with a significant bump in coverage: the service has gone from 150,000,000,000 URLs to having 240,000,000,000 URLs, a total of about 5 petabytes of data. More specifically, the Wayback Machine […]

Read More

Social networking word-of-the-day: “thinvisibility”

A new word for Facebookers and social networkers who cavalierly post embarrassing information about themselves to the web: thinvisibility:  Here’s a starting definition: Thinvisibility: n. Being neither completely visible nor completely invisible. Being a tiny, shiny needle in a haystack of information overload. Being invisible to everyone except data aggregators and digital preservationists such as Google, […]

Read More

NARA hosting “lite” Bush website archive

There are plenty of good changes in the new whitehouse.gov site, such as a better copyright policy that enables clearer copying and remix, and a much shorter robots.txt file, which makes it easier for search engines and archivists to index and archive the site.  (Compare the current 4-line Obama robots file to a 2300+ version […]

Read More