Fundamentals on sharing data, selfishness and the advancement of science
We live in a world full of data that is constantly being gathered, processed and so on. Even though it is so, there's still a gap between that and the accessibility to data; sadly even when the data is held by public institutions. I present here a tiny mental experiment that tries not to persuade but to demonstrate from two opposite perspectives the importance of sharing data.
Data might be expensive to gather and process and some of it can be very difficult to acquire. Here we're left with two questions:
1. Do I keep this data to myself because only I know the hardship I went through to get it or do I share it because as I already went through so much for it, it wouldn't make sense for someone else to do it again?
The other question would be constructed from an opposite perspective:
2. I have to take this set of data but it's incredibly expensive and hard to get, but most of all there's a big risk of failing in the way because of how difficult the sampling is, is there a chance someone already did something similar so that I can at least do some data exploration and realize whether it is feasible to do what I have in mind or even better, think of a new way to do it that'll throw better results and so on and so on?
After this set of questions it is easy to realize that sharing data is good for both perspectives although a little bit better when we're the ones in need and for the sake of science we should always be in that need.
For me it's clear how great it is that other people (organizations, institutions, individuals) share their data, mostly because of how much of it I have use through my undergraduate. It is easy to see how courses projects can be much more interesting when using data from big projects or organizations e.g. spatial data from Copernicus, Landsat or Sentinel. Yet even when is less and more specific data, mean while it is well documented it'll be useful.
This experience makes it clear to me that whatever publication I produce or collaborate in should go alongside it's data. Not only it is reusable but it is clear that the findings are correct and by correct I mean that the data was well sampled, managed and that statistics analyses were the right ones.
If you're not sure about it or have never try, I'd say you should go ahead and get some free data shared on the Internet and use it for any reason you need, may be to show how to do some analysis, use an R package or practice how to run some analysis. You'll probably be amazed by the perks of data sharing.