My Photo
Blog powered by Typepad

Currently reading...

Personal favorites

Search my library

Library Thing

Victorian Studies


Fine Arts

Buy Books!



« My Sister, My Love | Main | A Jealous Ghost »

September 01, 2009



The problem, as always, is in the original metadata. Though people like to blame it on Google, a lot of it has to do with retrocon projects as libraries moved from card catalogs to OPACs, especially with undated works.

Mr Punch

Other issues are associated with series of monographs, reprints, etc., which were problematic in card catalogs.

-- Jeremy York, Project Librarian, HathiTrust

Copyright determination is a challenging issue generally, and not less in HathiTrust. John Wilkin addresses our strategy for opening access to orphan works that are actually in the public domain in a comment to Geoff's blog post ( The task of opening access to works that we know are in the public domain can be equally arduous at times (requiring manual review) because of the number of volumes being ingested into the repository (approximately 400,000 and 300,000 in the last two months respectively) and the quality of original metadata.

For instance, the volume listed above, From Dawn to Dark in Italy, was originally cataloged with no date [n.d] in the published date field. When volumes are ingested into HathiTrust, an automatic processes uses existing metadata to make preliminary copyright determinations on those volumes. To protect against copyright infringement, this process is necessarily conservative and if the copyright status of a volume cannot be definitively ascertained (as in the example above), it is given a "search only" status in HathiTrust. It will remain unavailable to most users except for searching purposes until it turns up in our manual review process, or users let us know (which we encourage! - you can let us know by clicking the feedback link on any item). We are reviewing this volume now.

It should be noted that while search functionality in "search only" books is limited, it does provide a general level of access to these (mostly in copyright or orphaned) works. As John mentioned in his post as well, the fact that these volumes are being preserved in HathiTrust allows us to make them available fully to users with print disabilities, and for purposes of computational research as well.

Regarding the incorrect dating of Tales of the Persecuted, the publication date is listed as [18--] in the catalog record. That it is being represented as 1800 in the search results is an issue of date normalization for search and faceting purposes that we are looking into (thanks!). Again, we welcome feedback through links on the site for any information that appears confusing or incorrect.

As far as search, we are targeting early October for the release of full-text search functionality over all volumes (public domain and in copyright) in HathiTrust. We expect to be approaching 5 million volumes at that time. When you do a search in our bibliographic catalog (, at the bottom of the faceting list there is an option to restrict results by volumes from a contributing institution.

Thank you for the compliments on our interface and stay tuned, we are working on a new release that will make it easier to read and browse books.

The comments to this entry are closed.