Mar. 5th, 2019

thewayne: (Default)
Packt has this excellent deal for computer professionals: they give away a new book every day! I've been exploiting this for the last 2½ years and have accumulated a library of 594 books, and probably 85%+ of those have an accompanying Zip file containing code examples or templates. And I have bought books and courseware from them. I'd say they focus 90%+ on open source but frequently have stuff on proprietary architecture that's in common use.

Well, they changed their game up with this new year.

Two things happened. I think it was last year they released their own ebook reader, as that seems to be a trendy thing to do. This year, after their year's end '$5 for any ebook' promo ended (runs mid December to mid January), they resumed their free ebook of the day, but with a twist.

My normal procedure was that on the first Saturday of the month I'd download all of the ebooks that I'd added to my account. I don't bother doing this on the first in February because the $5 promo goes well into January, and I've added so many books that I 'get' only 40% or so of the freebies offered. So last Saturday I sign on to my account to download books - and you can't download them anymore. Now when you add a book from their free book of the day, you are adding it to your library, but you can only access it through their ebook reader app.

On the plus side, they're now giving away video content, and you can still download the Zip companion. On the minus side, you must have WiFi at the minimum to read your books, which could be problematic at times. But it is definitely a smart move on their part.

If you are or know a serious IT programmer or open source techie, point them to this, it's an excellent source. The free book changes at midnight or 1am UK time. Sometimes I've had problems with their web site, and recently it's always been solved with deleting my local cookies for Packt.com. You will have to register on their site to get these books, but you won't have to give them any address information or even a credit card, unless you buy stuff, and then you don't have to save it permanently.

Among the books that I've gotten from them is how to write web scrapers in Python. I wonder if their TOS have been updated... (not that I'd do such a thing, just as an intellectual/programming skill exercise: I wouldn't want to get banned!)

https://www.packtpub.com/packt/offers/free-learning
thewayne: (Default)
More things learned today:

Quality of OCR in Acrobat Pro decreases (fragment rate increases) if the:
-- Font is Bold
-- Page isn't flat
-- Page is skewed

I knew about the second and third, the first one surprised me. It might have been increased by non-perfect flatness and skew.

One thing that surprised me was what I learned when I did a little digging beneath the shortcut that launches the program. The executable was dated 2014! They bought the system in '15, and the software was pre-installed, so nothing has been updated since it was first set up. The head librarian is going to look for the contact info for the salesman to see what can be done for an update. Their software neither checks for updates nor has a menu function for checking for updates. I went to the manufacturer's web site and the actual software that we use isn't listed! I think they've upgraded to something with a different name, so I don't know if we'll be able to update it.

I scanned one of the hardbounds from back to front, and the behavior of the software building the PDF backwards was consistent: the PDF was in the correct sequence since I scanned it backwards.

I tested how long it took to fix fragments by repairing the first ten pages of one of the 40+ page PDFs, and it came to 3-4 minutes per page, meaning 2+ hours for the large PDFs. Speaking with the head librarian, she didn't think it was a good use of my limited time right now, so we're not going to do it. I think it's a good call, we can focus on the core job of getting all of the reports scanned which is the main goal of my internship. The OCR is good enough that anyone who wants to do Finds on these documents will have reasonable hit rates. Still, spending half an hour fixing 10 pages was a good use of my time to determine that 3-4 minutes per page number.

I learned of the bold problem seeming to increase fragment rates while I was doing the edit. The second page of these reports is a table of contents with an index entry on the left and then periods filling to the page number on the right, and the entire page is bold. On many lines the program identified these repeating periods as fragments, I'd have to tell it that these are not words. If the page had been laid out in an actual publishing program with proper kerning and such, maybe it would have scanned better and the OCR would have performed better, I don't know. It's definite that just bolding the entire page made it harder to read.

And this highlights a programming weakness in Acrobat Pro. The fragment interface has an option to highlight all fragments in the document, but if you skip over one, you have no way of going back to the one that you skipped. Bad interface design! But it's also Acrobat v11, which is long past support date, so I guess that I shouldn't be surprised. It's possible that current AcroPro versions have improved functionality, I wouldn't know: my version on my Mac is version 10, a generation older!

June 2025

S M T W T F S
123456 7
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 12th, 2025 04:19 am
Powered by Dreamwidth Studios