Wanting to profit from AI companies hunt for training data (over and above the community that created that data) is a big part of what created the context for the recent migration away from Reddit. How will the fediverse approach this problem?

  • sachasage@lemmy.worldOP
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Fair, but then there’s a line between scraping through ordinary traffic and using API access to gather large data sets.

    • key@lemmy.keychat.org
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      Is there? Effect is the same. Use machine learning to parse html generically and throw hardware and a pool of IPs at it. A lot more efficient than coding an API client for every service out there. It’s the same approach search engines use.

      I don’t see anything being done effectively without legal protections.