Creating a new instance only gets you access to content that users of your instance have subscribed to, and then mostly only content that comes in after subscription (I believe Lemmy primes the pump a bit on community subs, pulling in a handful of posts at the time of discovery, but discovery is done by users). So, there’s a limit on what you can scrape with your own private instance, and you’re taking a bit of a bet on which communities will yield what you’re looking for in the future.
It’d be easier and more reliable to just crawl the network and scrape it the old fashion way.
Creating a new instance only gets you access to content that users of your instance have subscribed to, and then mostly only content that comes in after subscription (I believe Lemmy primes the pump a bit on community subs, pulling in a handful of posts at the time of discovery, but discovery is done by users). So, there’s a limit on what you can scrape with your own private instance, and you’re taking a bit of a bet on which communities will yield what you’re looking for in the future.
It’d be easier and more reliable to just crawl the network and scrape it the old fashion way.