There’s been a lot of talk about Meta’s new twitter clone called Threads because it will federate with other ActivityPub apps. I’ve seen several posts about them possibly using the app as a way to embrace, extend, and extinguish ActivityPub.
Another more immediate concern I have is that Meta will now be able to harvest data from users of other ActivityPub social networks like Lemmy and Mastodon. If Alice on Threads follows Bob on Mastodon for example, that means Bob’s mastodon instance will send information about all of Bob’s posts and everyone who interacts with them to Meta so that Alice can see it.
This is a concern specifically with Meta and other big tech companies running ActivityPub-enabled servers, because their primary motive is to harvest user data to use for advertising. The scariest part to me is that users on networks like Mastodon specifically migrated to Mastodon to get away from big tech, and Meta is still able to harvest their data with Threads.
Anyone can set up a Lemmy instance, write a small script/bot to find and follow all the communities on all the instances in the Fediverse and store all that data. It’s not even hard, maybe a day of work for a proof of concept if you start from zero. (Then you have to figure out how to scale it properly, how to detect you’re getting defederated and how to change domains to restart without the defederations. Maybe a week’s worth of effort.)
Threads would be way overkill to achieve this goal. You don’t need any users. You don’t want any users. Just your one account that follows everything.
Edit: or you can just set up a web crawler like Google Search uses to find and store all the data you’re looking for, you don’t necessarily to be federated / use ActivityPub