Wanting to profit from AI companies hunt for training data (over and above the community that created that data) is a big part of what created the context for the recent migration away from Reddit. How will the fediverse approach this problem?
Wanting to profit from AI companies hunt for training data (over and above the community that created that data) is a big part of what created the context for the recent migration away from Reddit. How will the fediverse approach this problem?
By not spitting into the wind. It’s infeasible to try to prevent all web scraping from any possible IP which is what you would need to do. Reddit just took advantage of the media topic as a justification, they’re not doing anything real.
Fair, but then there’s a line between scraping through ordinary traffic and using API access to gather large data sets.
Is there? Effect is the same. Use machine learning to parse html generically and throw hardware and a pool of IPs at it. A lot more efficient than coding an API client for every service out there. It’s the same approach search engines use.
I don’t see anything being done effectively without legal protections.