Duck full-length transcriptome data for genome annotation and evolution analysis

Duck full-length transcriptome data for genome annotation and evolution analysis

Dr Hou’s group has been studying duck genomics and genetics for more than ten years at the China Agricultural University. The team includes more than 10 people with different professional backgrounds, and they specialized in genomic selection, fat metabolism, breast muscle development and analysis of eggshell quality. In order to solve genetic puzzles of duck important traits, therefore, we have accumulated a large number of sequencing data, including the resequencing data from all over the world, multi-period and multi-tissue transcriptome data and proteome data of duck eggshells.

When analyzing these data, we found that the low quality of the duck reference genome will reduce the utilization of sequencing data and the accuracy of the analysis results. We realized that assembling a high-quality reference genome is a problem that we must solve. In June 2018, we began to prepare for the assembly of the duck reference genome. During the year and a half, most of the team members have devoted a lot of effort, and we strive to do our best in every step. In this work, we obtained large scale basic data such as high-coverage Pacbio, bionano, Hic and multi-tissue full-length transcriptome. The assembly work of our genome has been completed basically and will be public in the near future for everyone to use. We also hope that all researchers interested in relevant studies can actively participate in our work and communicate with each other.

Generally, the published articles on assembling genome will only focus on the description of assembly and analysis results, with little details of the raw data used. We believe that a detailed description of the raw data can help other researchers make better use of the data and make it more valuable. The data we describe is the full-length transcriptome sequencing data from multiple tissues of duck, which plays a crucial role in the prediction of genetic models. The value of the data set is demonstrated by the fact that we have already been able to use the data in several further studies. The data set contains full-length transcriptome sequencing information from 8 tissues including pectoral muscle, hypothalamus, pituitary, testis, ovary, heart, uterus and 13-day-old embryos. 199993 unique transcripts were identified, of which 93.57% were functionally annotated. 23,755 lncRNAs were identified based on the coding ability of the sequences, and 35031 alternative splicing events were identified on 3346 genes. These data not only play an important role in improving genome annotation, but also a valuable genetic resource for evolution alternative splicing analysis and comparative transcriptome analysis between birds.

In recent years, journals such as Scientific Data have provided an opportunity for more and more data to be shared. Our team still has large scale basic data that has not been fully utilized. Although we have spent a lot of time on data analysis, we are still willing to continue to organize valuable data sets and share them. I followed this development with enthusiasm and thought that this is an essential step in scientific research towards open cooperation. My co-authors and I hope that our data release can also help, and encourage more researchers to do the same.

You can find and use our dataset here. We hope researchers from relevant branches find this data set helpful and welcome any feedback to us.

Please sign in or register for FREE

If you are a registered user on Research Data Community, please sign in