How will the fediverse respond to AI orgs scraping Lemmy/the fediverse for training data?

sachasage@lemmy.world · 1 year ago

How will the fediverse respond to AI orgs scraping Lemmy/the fediverse for training data?

key@lemmy.keychat.org · 1 year ago

By not spitting into the wind. It’s infeasible to try to prevent all web scraping from any possible IP which is what you would need to do. Reddit just took advantage of the media topic as a justification, they’re not doing anything real.

sachasage@lemmy.world · 1 year ago

Fair, but then there’s a line between scraping through ordinary traffic and using API access to gather large data sets.

key@lemmy.keychat.org · 1 year ago

Is there? Effect is the same. Use machine learning to parse html generically and throw hardware and a pool of IPs at it. A lot more efficient than coding an API client for every service out there. It’s the same approach search engines use.

I don’t see anything being done effectively without legal protections.