Interesting times!
The suit contends that ChatGPT did not have permission to do a deep scan of the NYT's article database to train their system, and in doing so violated the NYT's terms of service.
From the Ars article (an Arsicle?): "Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content."
and "This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
But here's the biggie: "NPR reported that OpenAI risks a federal judge ordering ChatGPT's entire data set to be completely rebuilt—if the Times successfully proves the company copied its content illegally and the court restricts OpenAI training models to only include explicitly authorized data. OpenAI could face huge fines for each piece of infringing content, dealing OpenAI a massive financial blow just months after The Washington Post reported that ChatGPT has begun shedding users, "shaking faith in AI revolution." Beyond that, a legal victory could trigger an avalanche of similar claims from other rights holders.
Unlike authors who appear most concerned about retaining the option to remove their books from OpenAI's training models, the Times has other concerns about AI tools like ChatGPT. NPR reported that a "top concern" is that ChatGPT could use The Times' content to become a "competitor" by "creating text that answers questions based on the original reporting and writing of the paper's staff."
Fair Use is quite an issue. I quote news sites all the time, just like the excerpts above. I make no claim it is my content, it is clearly delineated as to what is quoted from the article and what is my commentary or additional content. And I am in no way making any money from this. Things are a little different when you have AI/LLM systems hoovering up all the content that they can find to train up. Those system makers want to spend the least amount of money possible to train their systems because their energy costs are absolutely huge! I posted an article a month or so ago about a new supercomputer that will be running an AI system that consumed as much power as either 3,000 or 30,000 houses, I saw both numbers. If these guys can get training data for free, they'll go for it. But authors are pushing back: if people have to buy their books to read it (excluding libraries where people can borrow for free), then why should AI companies get a free read?
If an art generating AI wants to use my photos, I would like to be compensated! If you want to use one of my photos for a desktop wallpaper or screen saver, I'm honored. If you sell my photos for profit - then we have an issue! I've spent over four decades developing my craft and I'm pretty decent at it, I'd like some acknowledgement and compensation for it and not for it to be stolen for an AI system's use, as they've been doing.
https://arstechnica.com/tech-policy/2023/08/report-potential-nyt-lawsuit-could-force-openai-to-wipe-chatgpt-and-start-over/
The suit contends that ChatGPT did not have permission to do a deep scan of the NYT's article database to train their system, and in doing so violated the NYT's terms of service.
From the Ars article (an Arsicle?): "Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content."
and "This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
But here's the biggie: "NPR reported that OpenAI risks a federal judge ordering ChatGPT's entire data set to be completely rebuilt—if the Times successfully proves the company copied its content illegally and the court restricts OpenAI training models to only include explicitly authorized data. OpenAI could face huge fines for each piece of infringing content, dealing OpenAI a massive financial blow just months after The Washington Post reported that ChatGPT has begun shedding users, "shaking faith in AI revolution." Beyond that, a legal victory could trigger an avalanche of similar claims from other rights holders.
Unlike authors who appear most concerned about retaining the option to remove their books from OpenAI's training models, the Times has other concerns about AI tools like ChatGPT. NPR reported that a "top concern" is that ChatGPT could use The Times' content to become a "competitor" by "creating text that answers questions based on the original reporting and writing of the paper's staff."
Fair Use is quite an issue. I quote news sites all the time, just like the excerpts above. I make no claim it is my content, it is clearly delineated as to what is quoted from the article and what is my commentary or additional content. And I am in no way making any money from this. Things are a little different when you have AI/LLM systems hoovering up all the content that they can find to train up. Those system makers want to spend the least amount of money possible to train their systems because their energy costs are absolutely huge! I posted an article a month or so ago about a new supercomputer that will be running an AI system that consumed as much power as either 3,000 or 30,000 houses, I saw both numbers. If these guys can get training data for free, they'll go for it. But authors are pushing back: if people have to buy their books to read it (excluding libraries where people can borrow for free), then why should AI companies get a free read?
If an art generating AI wants to use my photos, I would like to be compensated! If you want to use one of my photos for a desktop wallpaper or screen saver, I'm honored. If you sell my photos for profit - then we have an issue! I've spent over four decades developing my craft and I'm pretty decent at it, I'd like some acknowledgement and compensation for it and not for it to be stolen for an AI system's use, as they've been doing.
https://arstechnica.com/tech-policy/2023/08/report-potential-nyt-lawsuit-could-force-openai-to-wipe-chatgpt-and-start-over/
no subject
Date: 2023-08-24 11:41 pm (UTC)When I was running the Derm section of the new undergraduate medical teaching, I was greeted with howls of outrage by my colleagues when I told them it was a legal requirement that they needed permission to use images etc in lectures.
no subject
Date: 2023-08-25 05:40 am (UTC)Oh, I'll bet there were! I'm kind of glad that I work at a junior college and don't have to get that specific on stuff like that. I'm sure they're a lot more strict on stuff like that at the main campus. But there's also more sources for open licensed images and such these days, may not be as difficult now. And for education, there are different rules depending on the material, for example, I think it's okay to copy images out of a text book that you're using to use in a presentation in class. But finding one online? Definitely iffy territory. Everyone who works for the uni has to take a copyright awareness training module every year.
no subject
Date: 2023-08-25 12:36 pm (UTC)What we were told, was that imagines, whether online or from a textbook, could be used in an informal talk without limit. But if giving a formal class, we were supposed to get permission. This came down from the legal office, but still greeted with howls of outrage. I don’t think copyright law recognizes “educational” as a category: it’s either personal use, or commercial use. Since students pay tuition, it counts as commercial. Since professors regard universities and godly and special (as do the universities themselves), they are appalled at being classified with crass, commercial entities.
no subject
Date: 2023-08-25 09:37 pm (UTC)One example that I recall from our training module was a professor wants to use that personality assessment quiz that he found online (what personality type are you?) for his class. He can't, because it's copyrighted material, regardless of where he found it. Considering how large some university endowments are, and how poorly they pay their minions, they'd better consider themselves crass commercial entities!