thewayne: (Default)
While straightening books today, I came across a very interesting title that I wanted to share. Coming of Age In Second Life, An Anthropologist Explores the Virtually Human, by Tom Boellstorff. This guy is an actual anthropology prof and teaches it. He spent two years studying the people who populate Second Life, embedding himself there by creating the avatar Tom Bukowski. The back cover has this bio: "Tom Bukowski was born on June 3, 2004, and has been conducting anthropological research in Second Life since that time. His home, Ehtnographia, is located in the Dowden region of Second Life. He is a fan of the game Tringo and enjoys floating across Second Life landscape in his hot air balloon." The book looks like an interesting read. Published by Princeton University Press. I particularly wanted to share this since I know there are some current/former SL players among my friends here.

Last week I came across Perestroika by Mikhail Gorbachev, the former leader of Russia. Sadly, it was in the form of an inter-library loan request. The sad part was that it was requested by a prison, and I only have a hard-back in my stacks and most prisons can only accept paperbacks as the prisoners can turn cardboard into shivs. Copyright 1987, so written while the Soviet Union was still standing, before the Berlin Wall fell. I'll get to it... someday?

I Buried Billy. I don't have the author off-hand, a Mexican dude. This guy was a friend of Billy the Kid, the notorious outlaw of Southern New Mexico, knew him in his late years and was one of the first to get the news that Billy had been killed. He went out, bought a suit and a shirt, claimed the body, and laid him out and buried him. It's a memoir of his last days with Billy. The guy went on to become one of New Mexico's first state legislators. This book is one of the only - perhaps THE only - written eye-witness accounts of Billy the Kid! Myself, I've never liked the glorification of BtK, everything I've read about him I interpreted him as a hood and nothing to be respected. I want to read this book to see if there's another angle that I'm not aware of. I'm really looking forward to this book coming back so I can check it out.

I don't remember if I've mentioned these before. The Collected Speeches of Malcolm X. This book collects six or seven of the later speeches of Malcolm, one from before he left the Nation of Islam and the rest from after. Another book for my 'to get to eventually' list. I still haven't watched that movie.

Back to the Western genre, we have a book called The Earps Talk, it collects the courtroom testimonies of Wyatt Earp, his brothers, and Doc Holliday after the Gunfight at the OK Coral!

I'm currently reading a book called Misquoting Jesus, written by a devout Christian scholar who learned Greek, Latin, and Hebrew in order to gain access to old documents and study them directly. He puts forth that it's impossible to know what the Bible says because not only do we not have access to the original source documents, we don't have the copies of the source documents. And the copies of the copies of the copies of the source documents have so many errors, and errors when compared to each other, that it becomes this giant mish-mash. The inconsistencies pile higher and higher. I discovered this one while cruising our catalog, looking something up for whatever reason.
thewayne: (Default)
This is really cool! Hernando Colón was a huge collector, he bought all sorts of books, not just Plato and high-brow stuff, so this will be quite an insight into books of the time. Obviously a lot of those books no longer exist, it will be interesting to see how many still do.

But the cool part is that it is HUGE - over 2,000 pages! And Hernando hired people to read all the books and write summaries! It's a descriptive catalog! And the photos in the catalog are gorgeous!

The book is in the process of being translated, and when done, it will be released publicly so anyone can read it. Should be quite interesting.

https://www.npr.org/2019/04/24/716600905/christopher-columbus-son-had-an-enormous-library-its-catalog-was-just-found
thewayne: (Default)
and that comes to an end next week. And I wanted to bake a batch of my oatmeal/chocolate chip/macadamia nut cookies to take in on Sunday, but that didn't happen. I get up Monday, start getting ready, and find an email that has a subject like 'water line break, we might have to close the campus'. The line in question is not on-campus, but the main supplying the campus and presumably part of the area.

They closed the campus. But it did give me time to bake the cookies Monday afternoon.

And Tuesday they closed the campus. Apparently they fixed the break, and when they went to test the fix, it leaked in a different spot.

Today they closed the campus for the rest of the week. Apparently it's a pretty serious problem.

Now, one batch of cookies made 4+ sheets of 20 cookies. Russet messed up the work schedule and she's working six nights straight, which is largely unprecedented as no one on her telescope usually works more than 3 nights in a row. But she went in Monday and Tuesday to help with a visiting class - those are always a mess because now you have an additional dozen students in the control room, then tonight and tomorrow she somehow scheduled the new guy, a grad student working on his PhD, during engineering!, and then she's scheduled for three nights! The telescope scheduling is done on a quarterly basis in two parts: the science schedule, which is a major PITA, then the work schedule, which is fairly easy. She just failed to notice that she put the new guy on an engineering slot. He would have to learn it sooner or later, but not when he's TAing and working to finish up and defend his thesis.

So I guess she's going to be taking most of the cookies to work tonight and I'll make a fresh batch Sunday or Monday. And I'll work either extra days next week or the week following. Later I'll load my hours into a spreadsheet and see where I'm at in terms of making the time that I'm supposed to for my internship, but more importantly, I've got a feature to add to the catalog database that I'm writing for them that I need to get done.

Wheeeee.

Oh. And on top of all of this, I checked out a four disc DVD set of Midsomer Murders, so that'll be four items one week overdue. I sent an email to the woman in charge of the library saying 'This is just a clever ploy to get two days of late fees for my box set, isn't it?' and she replied 'Devious, aren't we?'

I expect they'll be waiving late fees since the campus will be closed all this week.
thewayne: (Default)
More things learned today:

Quality of OCR in Acrobat Pro decreases (fragment rate increases) if the:
-- Font is Bold
-- Page isn't flat
-- Page is skewed

I knew about the second and third, the first one surprised me. It might have been increased by non-perfect flatness and skew.

One thing that surprised me was what I learned when I did a little digging beneath the shortcut that launches the program. The executable was dated 2014! They bought the system in '15, and the software was pre-installed, so nothing has been updated since it was first set up. The head librarian is going to look for the contact info for the salesman to see what can be done for an update. Their software neither checks for updates nor has a menu function for checking for updates. I went to the manufacturer's web site and the actual software that we use isn't listed! I think they've upgraded to something with a different name, so I don't know if we'll be able to update it.

I scanned one of the hardbounds from back to front, and the behavior of the software building the PDF backwards was consistent: the PDF was in the correct sequence since I scanned it backwards.

I tested how long it took to fix fragments by repairing the first ten pages of one of the 40+ page PDFs, and it came to 3-4 minutes per page, meaning 2+ hours for the large PDFs. Speaking with the head librarian, she didn't think it was a good use of my limited time right now, so we're not going to do it. I think it's a good call, we can focus on the core job of getting all of the reports scanned which is the main goal of my internship. The OCR is good enough that anyone who wants to do Finds on these documents will have reasonable hit rates. Still, spending half an hour fixing 10 pages was a good use of my time to determine that 3-4 minutes per page number.

I learned of the bold problem seeming to increase fragment rates while I was doing the edit. The second page of these reports is a table of contents with an index entry on the left and then periods filling to the page number on the right, and the entire page is bold. On many lines the program identified these repeating periods as fragments, I'd have to tell it that these are not words. If the page had been laid out in an actual publishing program with proper kerning and such, maybe it would have scanned better and the OCR would have performed better, I don't know. It's definite that just bolding the entire page made it harder to read.

And this highlights a programming weakness in Acrobat Pro. The fragment interface has an option to highlight all fragments in the document, but if you skip over one, you have no way of going back to the one that you skipped. Bad interface design! But it's also Acrobat v11, which is long past support date, so I guess that I shouldn't be surprised. It's possible that current AcroPro versions have improved functionality, I wouldn't know: my version on my Mac is version 10, a generation older!
thewayne: (Default)
I'm doing an internship in our local university library through April, and my main task is scanning their annual 'Reports To The President', a précis of college activity sent up to main campus and bound in a book, usually hard-bound. The oldest book was 1965-66, the newest that I've seen thus far is '98-99. I believe there are newer already in PDF format online on the local network. Apparently by scanning them and then coding an RDA record for each file, we can get them hosted by the state academic library organization, or somebody, for free.

So that's cool.

I'm using a fairly spiffy Fujitsu specialized document scanner that can scan two pages of a bound book in one pass, but I don't think their software is as good as they claim it to be. It can handle a pretty significant amount of curvature in the books - for example, I was scanning three pages starting at page 385 of 427, so LOTS of curve when you're that far in. I was holding up the left side to get the right page reasonably flat, then holding down both sides with one finger.

And yes, the fingers were captured by the scan.

After you've done your scan, you get into the next phase, where you drag this wire frame to line up one line down the spine between the two pages, then you align four corners to the outside corners of the pages. The program does a good job of detecting the edges and snapping to it, but sometimes you have to do some dragging to improve alignment. Once you've aligned all the scanned pages correctly, you click an Apply button and it re-cuts the scans into individual pages and flattens them, programmatically removing the curve. It does a very good job, though not perfect.

THEN you have to go back through every page and remove the fingertips! It has a special tool just for it and works a lot like Photoshop's patch tool, but it auto-selects the fingertip. Click Apply, and the fingertip vanishes.

Once you've removed the fingertips, you can save it to PDF. Theoretically the program performs OCR (optical character recognition), but I can't see that it has any effect. I end up loading the PDF into Acrobat Pro and running OCR there.

And this is where I learned something tonight. While you can't do a spell-check on a scanned document because you're dealing with a scanned image, not words, there's something that's similar: a fragment check. Fragments are words that Adobe Acrobat recognizes as 'I think this is a word or something, but I'm not sure, therefor I didn't map it into the OCR side of the document. Fix it.' Acrobat can't provide a dictionary of suggestions like Word, so when it sees something that it thinks was a word but it couldn't map, you have to type the correction. Or page number. Or budget number. Or tell it to ignore it.

It took me a good half an hour to fix a three page document. I don't know how many times I typed the San of San Juan. Just the San, apparently Juan was recognizable.

And that was a three page document from '67-68. The latest document from '98-99? That was 40some pages, I'm going to run a fragment check on it tomorrow afternoon and we shall see how long it takes to fix.

One very odd thing about scanning two pages at once in bound books - the page sequence is reversed! This is easily fixed in Acrobat Pro when you're dealing with a handful or two of pages, you just slide page thumbnails around. But dealing with 30 or 40 pages? Next week I'll try scanning a book starting with the last pages and working my way forward and seeing how that works.

So important tip when creating PDFs from scanned docs for public consumption: running OCR is only half the job. If you need the document to be searchable, you MUST spend the time to run a fragment check on it and fix all of the problems! Otherwise you're going to frustrate anyone needing to do anything serious with the document.

One thing that makes me really wish I had a working Mac laptop: I'd like to take an unfixed doc and run it through text to speech and see how it works. Then run the fixed doc through TTS. Might be interesting.
thewayne: (Default)
It's one of my two last library classes to complete my degree. It's only an Associates in Library Science, which when coupled with $4.10, will get me a kid's meal at Burger King.

One of the community colleges associated with NMSU offered the Library Science series online. I came across a free ebook titled So You Want To Be A Librarian, and it really appealed to me on a philosophical and ethical level. And since my wife works for the uni, I get six hours a semester for free, so it's only been lab fees and books out of pocket. But I'll only end up with an Associates: the school offers nothing higher, and after next year, they offer nothing at all - the program has been cancelled. But I just need this class and my capstone, which I'm also taking. That, and a communications class which I'll take over the summer, and I'm done.

I'm working at my local uni, and rather than doing general library stuff (which I'm hoping to do some of), I'm doing a scanning project. They bought a high-end scanner that does de-curling, OCR, all sorts of stuff, to scan their archives, starting with the college's annual reports to the university president. Fairly straightforward task. The tricky bit comes in with coding it in RDA format! So I'm coding a database to hold the records, now I just have to teach myself RDA. I've never done anything except descriptive cataloging, while I'm familiar with MARC, I've never actually worked with it. At least I don't have to unlearn anything or eliminate any bad habits.

I'm not going to pursue an MLA, I just don't think it's worth the effort and expense at my age. My hope in completing this coursework was in hopes that at some point it might help leverage my IT skills into a job at a library. Time will tell. At least I learned a lot of interesting stuff, and I do like learning stuff.
thewayne: (Default)
There's an interesting little exception in American copyright law that is being exploited: if a book published between 1923 and 1941 is out of print, a physical copy can be scanned and released into the commons. And it's being actively exploited in libraries across the country!

Hence, the Sonny Bono Memorial Collection! (link)

This is a fascinating project by the Internet archive. There are currently 709 books in the collection, I have no idea at what rate new items are added. The one book that I looked at was viewable online and downloadable as a PDF, epub, Kindle, Abbey, and in some other formats.
thewayne: (Cyranose)
Very good piece, two things in particular stood out. First, he attended a talk about the American private prison industry and that they projected their future growth and need for cells based on the literacy rate of 10 and 11 year olds. Wow. Second, that English town councils have been closing libraries to save money, Gaiman talks about the penny-wise, pound-foolish aspect. He says that England is the only country in which the oldest generation is more literate and numerate than the youngest generation, which bodes ill for the future.

http://www.theguardian.com/books/2013/oct/15/neil-gaiman-future-libraries-reading-daydreaming
thewayne: (Cyranose)
[livejournal.com profile] silveradept posted about this recently.  In Hudson Falls there was a library aide who had worked there for 28 years.  Enter a nine year old kid who, for the fifth year running, won the summer reading program.  The library director thought this discouraged kids from reading, and decided that the winner would be drawn from a hat of all the kids who read at least ten books during the summer.  After all, the kid had run away with such fabulous prizes as a t-shirt, a sports bottle, and a bunch of certificates.

The library aide went to the local media to get the kid some recognition.  And got sacked.  No explanation, just a call from the library board of trustees that you no longer have a job.  But at least she wasn't alone, the library director was also sacked.

There's a petition circulating on Change.Org to try and convince the library's board of trustees to get her job back.

http://www.change.org/petitions/hudson-falls-free-library-board-of-trustees-apologize-to-lita-casey-and-offer-to-reinstate-her?share_id=XrloaDauyF&utm_campaign=signature_receipt&utm_medium=email&utm_source=share_petition


Info about her getting fired: http://www.theatlanticwire.com/entertainment/2013/09/librarian-fired-actually-getting-kid-read-contest-lita-casey/69606/
thewayne: (Cyranose)
No physical books. Period. 10,000 ebooks. LOTS of computers and you can check them out for use with your ebook reader.

I'd like to see this place if I ever get down around San Antonio.

http://www.npr.org/blogs/thetwo-way/2013/09/14/222442870/bookless-public-library-opens-in-texas
thewayne: (Cyranose)
I always sing "It's The Most Wonderful Time Of The Year" when Halloween approaches, and for the last few years I've really began enjoying poking sticks at people who are unjustifiably self-righteous, and Banned Book Week is one of those events that I may start singing this tune to. I especially love it because Alamogordo holds a special place in the pantheon of banned books: in 2002 (before I moved to Cloudcroft) there was a Harry Potter book burning organized by a local preacher. The irony is that later he was seen leaving the movie theater with his two young nieces/nephews/I don't know: the movie they were seeing? Harry Potter.

2012's list of the top ten most challenged books are:

  1. Captain Underpants (series), by Dav Pilkey.Reasons: Offensive language, unsuited for age group

  2. The Absolutely True Diary of a Part-Time Indian, by Sherman Alexie.Reasons: Offensive language, racism, sexually explicit, unsuited for age group

  3. Thirteen Reasons Why, by Jay Asher.Reasons: Drugs/alcohol/smoking, sexually explicit, suicide, unsuited for age group

  4. Fifty Shades of Grey, by E. L. James.Reasons: Offensive language, sexually explicit

  5. And Tango Makes Three, by Peter Parnell and Justin Richardson.Reasons: Homosexuality, unsuited for age group

  6. The Kite Runner, by Khaled Hosseini.Reasons: Homosexuality, offensive language, religious viewpoint, sexually explicit

  7. Looking for Alaska, by John Green.Reasons: Offensive language, sexually explicit, unsuited for age group

  8. Scary Stories (series), by Alvin SchwartzReasons: Unsuited for age group, violence

  9. The Glass Castle, by Jeanette WallsReasons: Offensive language, sexually explicit

  10. Beloved, by Toni MorrisonReasons: Sexually explicit, religious viewpoint, violence


http://www.bannedbooksweek.org/

August 2025

S M T W T F S
     12
34 56789
10111213 141516
17181920 21 2223
24252627282930
31      

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 25th, 2025 04:49 pm
Powered by Dreamwidth Studios