LYCOS RETRIEVER Beta Retriever Home  |  What is Lycos Retriever?   
Archive: Internet Archive
built 254 days ago
WASHINGTON, Aug. 23 /PRNewswire-USNewswire/ -- NASA and Internet Archive of San Francisco are partnering to scan, archive and manage the agency's vast collection of photographs, historic film and video. The imagery will be available through the Internet and free to the public, historians, scholars, students and researchers.
Internet Archive headquarters is in the Presidio, a former US military base in San Francisco. Robots.txt is used as part of the Robots Exclusion Standard, a voluntary protocol the Internet Archive respects that disallows bots from indexing certain pages delineated by the creator as off-limits. As a result, the Internet Archive has removed a number of websites that are now inaccessible through the Wayback Machine. This is sometimes due to a new domain owner placing a robots.txt file that disallows indexing of the site. The administrators claim to be working on a system that will allow access to that previous material while excluding material created after the point the domain switched hands.
Under the terms of this five-year agreement, Internet Archive will digitize, host and manage still, moving and computer-generated imagery produced by NASA. In the first year, Internet Archive will consolidate NASA's major imagery collections. In the second year, digital imagery will be added to the archive. In the third year, NASA and Internet Archive will identify analog imagery to be digitized and added to this online collection.
Currently, the Internet Archive applies robots.txt rules retroactively; if a site blocks the Internet Archive, like Healthcare Advocates, any previously archived pages from the domain are ... removed. In cases of blocked sites, only the robots.txt file is archived. This practice would appear to be detrimental to researchers looking for information that was available in the past.
SEARCH
MORE ABOUT