#SciData19: Parallels of Evolution Between Research and its Data

#SciData19: Parallels of Evolution Between Research and its Data

As a scientist in training, conversations about scientific integrity and ethics have always intrigued me. No matter how many conversations and training courses in ethics, there is always more to learn. The ever-evolving frontiers of science should parallel an evolving conversation around ethics and better data.

In 1876 the American Chemical Society was established in New York State. In 1937 it was officially sanctioned by the 75th congress with the aim of, “the improvement of the qualifications and usefulness of chemists through high standards of professional ethics, education, and attainments.” Later in 1965 the Chemist’s Creed was approved, and then replaced in 1994 by the Chemist’s Code of Conduct, which I encountered in 2012 Fall general chemistry at Whitworth University. It states:

“Chemical professionals should strive to remain current …, share ideas…, keep accurate and complete laboratory records, maintain integrity in all conduct and publications, and give due credit to the contributions of others. Conflicts of interest and scientific misconduct…are incompatible with this Code.”

Of course, I did not remember the exact quote, but to a simple-minded freshman the message was simple: one must publish their findings to their colleagues. Little did I know, it is not so simple. “Ideas and information” are not easy to share when ownership/authorship, or more importantly, funding and careers are involved. In addition, information in today’s world can include private information to which the study subjects do not have a say in. As pointed out by David Stillwell in his talk at #SciData19, which was most interesting for me, the data collected by big companies are unavailable for research and its own subjects.

Another thing that stood out during this talk was that even though I have been trained as a scientist, I hold the ‘public’ or ‘other’ view when it comes to other fields of science. Computer science is an example. Many of us probably recognize the mistrust coming from the world outside of academia, especially within our highly consumerist society surrounding us with marketing campaigns. During this conference I was made increasingly aware that we could feel that way even within the realm of science as I sensed a wave of concern about online data ownership and privacy in the questions coming from the audience.

Companies are required to shared data with researchers?

How we are able to say fake news vs fake project and fake researchers?

What should people be most concerned about?

Another audience member mentioned Edward Snowden. Although the questions and comments were relative and necessary, I feel like the main message of David’s talk veered off course.

Who’s collecting the data or what they do with it and how can we change this system?

David posed a scenario about Facebook data usage to his class. Most were okay with their data being used to personalize their healthcare but uncomfortable with the idea that their data could be used to decide whether they received a mortgage. This scenario could have some legs to it. The general flow of data from social media is provided by the user, which is you and me. We “click that box” to agree to let our data be seen or used. What is not as clear is where the data goes and how it is used. A company owns that data. The company decides what that user is able to share and then what a scientist is able to do with that data. An ideal scenario would follow those trendy razor commercials and “cut the middleman out”, giving more power to the user to share and more access to data scientists to study.

Stillwell did a wonderful job explaining this process and following up with some examples of positive directions currently underway in the EU. In May 2018, General Data Protection Regulation (GDPR) was approved to enforce rules about data sharing. One notable aspect is the right to portability, in which you have the right to ask for your data, free of charge, and theoretically take that data to another entity for analysis.

Applymagicsauce.com from the Psychometrics Centre at the University of Cambridge was an insightful tip given to apply that right of data portability. The demo allows you to upload your social media data, extract relevant parts, and predict a psychodemographic profile. Based on your digital footprint, the prediction shows how others will see you online and what a company can predict or profit from this information.  

In parallel, these privacy concerns bring to mind when the HeLa cell line was collected, propagated, and immortalized without the consent of Henrietta Lacks and her family. I think David’s argument about the use of data could be troublesome in this case. Even though HeLa cells are a big part of numerous scientific advancements and have helped improve healthcare tremendously, there is still a potential downside regarding privacy: their genomics information being ‘publicly’ known. In this case, the family is not very accepting of the process even though one may argue that having their genome being sequenced will help the family gain access to personalized healthcare. The risk of privacy remains.

With a conference surrounding the study of data, how it is used, and making it accessible to scientists, it was fitting that I was able to attend the conference remotely. There was no shortage of access to all the talks, including lightning talks, even after the conference took place. This has usually not been the case for many conferences I have been to. Scientists often feel forced into presenting parts their data until they have a paper ready for publication. The fear of being scooped is constantly looming. That someone somewhere with more resources will have the hypothesis proven before them.

The conference reinvigorated the importance of science in my life and in a different way that I had in mind when I began my journey as a biochemist. Our world is evolving faster than ever with the digital age and as the gray area increases between different fields of study, data is becoming more impactful and needs to be more accessible A lot of conversations are still needed.

Who should have access to the data?

Can we ever predict all of what can go wrong with data sharing?

How can we be more proactive and less reactive to our changing environment of research?

How to make data reproducible? Within and across disciplines – quality controls?

In an ever-evolving world and ever-evolving fields of study, it is necessary to constantly reassess the questions surrounding how to handle what goes into research, how it is done, and what products come from it. The chemist’s code of conduct might be in need of a refresh or even incorporation with a global code made more accessible to interdisciplinary research.

Mi Nguyen is Graduate Research Assistant at University of Illinois at Chicago. She is a winner of the Better Science Through Better Data writing competition. Read Mi Nguyen’s winning entry here.

Please sign in or register for FREE

If you are a registered user on Research Data at Springer Nature, please sign in