Today, Google Books announced that it has posted a list of 500 Greek and Latin critical editions of ancient texts with links to their high-resolution scans of these books. All the editions are out of copyright (and thus somewhat dated as these things go), but I noticed several that are still the main editions of these texts (e.g. Keil’s seven-volume edition of the Grammatici Latini). This list has been available for a few weeks now at least — a colleague posted the link to a listserv — but what is made clear by the post is that Google is aware of the problems with OCRing classical (polytonic) Greek. This has been a sore point with regard to Google Books for some time: the scholarly books have a diminished value when the OCR is as poor as it often is — in fact the underlying text can be absolute gibberish, which is no help at all. All in all, this seems like a very salutary development. One change I would make would be to list the books by alphabetical order of Author or Editor, rather than by alphabetical Titles, which can of course vary widely among editions of the same ancient work.
Strangely, if you click on the link to the work, you automatically download a .zip archive full of .png and .html files. But if you click on “(read in your browser)” it takes you to the normal Google Books page for the book, from where you can download a .pdf file, or search the work via the web interface, depending on your needs. On one hand this is quite helpful since 1) the .png files are, presumably, being provided to help OCR algorithm writers develop their code and 2) it’s not always easy finding older editions of ancient texts if you don’t know the specific (often unwieldy) title of the specific edition. On the other hand, how difficult would it be to make the default download link the .pdf file, or at least the normal Google Books page? This is what most people use on their own machines — surely non-techie users are going to be baffled by a .zip file full of images and .html files?