Chinese Journal of Oceanology and Limnology   2018, Vol. 36 issue(4): 1329-1341     PDF       
http://dx.doi.org/10.1007/s00343-018-7090-0
Institute of Oceanology, Chinese Academy of Sciences
0

Article Information

YANG Wei(杨尉), CHEN Huapu(陈华谱), CUI Xuefan(崔雪凡), ZHANG Kewei(张克伟), JIANG Dongneng(江东能), DENG Siping(邓思平), ZHU Chunhua(朱春华), LI Guangli(李广丽)
Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus(Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs
Chinese Journal of Oceanology and Limnology, 36(4): 1329-1341
http://dx.doi.org/10.1007/s00343-018-7090-0

Article History

Received Mar. 24, 2017
accepted in principle May. 3, 2017
accepted for publication Jun. 19, 2017
Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus(Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs
YANG Wei(杨尉)1,3, CHEN Huapu(陈华谱)1,2, CUI Xuefan(崔雪凡)1, ZHANG Kewei(张克伟)1, JIANG Dongneng(江东能)1, DENG Siping(邓思平)1, ZHU Chunhua(朱春华)1, LI Guangli(李广丽)1     
1 Zhanjiang City Key Laboratory of Marine Ecology and Environment, Fisheries College, Guangdong Ocean University, Zhanjiang 524088, China;
2 Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266071, China;
3 Food and Environmental Engineering Department, Yangjiang Vocational and Technical College, Yangjiang 529566, China
Abstract: Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker resources for future research into genetics, marker-assisted selection (MAS) and conservation biology.
Keywords: Scatophagus argus     illumina RNA-seq     reproduction     simple sequence repeat (SSR)    
1 INTRODUCTION

Spotted scat (Scatophagus argus), which has the notable ability to tolerate a wide range of environmental salinities, is widely distributed around the Indo-Pacific region, including Southeast China (Barry and Fast, 1992; Gupta, 2016). Spotted scat with such features as high nutritional value, easy cultivation, low feeding cost, etc. is increasingly becoming an important farmed fish with considerable economic value in East and Southeast Asia, and the commercial demand for seedlings is rising sharply. However, it is difficult to obtain spotted scat offspring via artificial breeding because of the different gonad maturation time between the sexes due to protandry (Chen et al., 2015a). Artificial reproduction remains a barrier for the aquaculture industry at present and juveniles still have to be captured from the wild. In the last few years, the natural population of spotted scat has been clearly declining (Gupta, 2016), so the quantity of available wild seedlings is severely limited.

A perfect artificial reproduction technology is the primary prerequisite for large-scale commercial fish breeding and culture (Xie et al., 2014). So far, a few studies have been carried out on induced breeding (Barry et al., 1991, 1993; Cai et al., 2010), reproductive biology (Cai et al., 2010; Zhang et al., 2013; Gandhi et al., 2014) and regulatory genes (Deng et al., 2014; Chen et al., 2015a, b ; Li et al., 2015; Liu et al., 2015; Chen et al., 2016) in spotted scat, but most molecular biology studies have been based on a single-gene strategy. The mechanisms of reproductive development and regulation have been little investigated and are poorly understood. Therefore, it is necessary to elucidate the molecular mechanism of the reproductive process and identify more functional genes related to sex differentiation, gonad maturation and gametogenesis. Unfortunately, compared with other fishes with economic value, the genetic and genomic information available for spotted scat is rather limited. To date, just 304 DNA and RNA sequences and one SRA have been deposited in GenBank, including 24 ESTs and 13 mitochondrial genes (analyzed on February 15th, 2017). For these reasons, there is an urgent need for more genomic background knowledge and breeding programs to maintain the sustainable development of spotted scat fishery.

With the aid of next-generation high-throughput sequencing technologies such as the Illumina/Solexa platform, transcriptome sequencing has been in widespread use to obtain large-scale transcript sequences and gene expression data rapidly and costeffectively for non-model species without reference genomic information (Garber et al., 2011; Finseth and Harrison, 2014). Transcriptome analysis has been widely employed in genetics, molecular biology, ecology, and physiology, and plays significant roles in gene discovery, gene expression and regulation analysis, and molecular marker development (Grabherr et al., 2011). Previous studies on Scophthalmus maximus (Ma et al., 2016), Trachinotus ovatus (Xie et al., 2014) Poecilia reticulata (Fraser et al., 2011) and Macrobrachium nipponense (Ma et al., 2012) suggest that, transcriptome analysis is an extremely efficient approach to identify large numbers of functional genes and simple sequence repeat (SSR) markers. By transcriptome sequencing with the 454 pyrosequencing technology, Li et al. (2014) identified and validated 33 novel single nucleotide polymorphisms (SNPs) for population and conservation genetics of spotted scat, but did not conduct functional gene discovery and expression analysis.

Using the Illumina paired-end sequencing technology, high-through-put RNA sequencing, de novo assembly, functional annotation and classification were performed in the present study. This work is the first comprehensive transcriptome analysis on spotted scat based on a multiple tissue sample mixing strategy. This study was designed i) to enrich the available genetic and genomic information for a deeper understanding of gene expression, functional gene mining and SSR marker development, and ii) to identify putative genes related to reproductive processes such as the GnRH-GtH axis, sex differentiation, gonad maturation and gametogenesis for future research into the molecular mechanism of reproduction. These data are also an important resource for reproduction-related studies in other teleost fishes.

2 MATERIAL AND METHOD 2.1 Animal materials and sample preparation

One-year-old adult spotted scat (females: n=10; males: n=10) were collected from the Zhuhai Yucheng Fry Cultivation Base (Zhuhai, Guangdong, China), and sacrificed by decapitation following MS222 anesthetization. Tissue samples including brain, pituitary, heart, muscle, head kidney, kidney, liver, gut, spleen, gill, testis and ovary were excised as soon as possible, frozen in liquid nitrogen immediately, and stored at -80℃ until RNA extraction.

2.2 RNA extraction

Total RNA was extracted from each tissue of the females and males using Trizol Reagent (Life Technologies, Carlsbad, CA, USA) following the manufacturer's instructions. The concentration of total RNA was determined by the absorbance at 260 nm using UV-spectrophotometry (Nanodrop, 2000c, Thermo Scientific, Wilmington, DE, USA) and the quality was tested using OD260/280 (1.8–2.0). Then, ethidium bromide staining of the 28S and 18S ribosomal bands on a 0.8% agarose gel was used to assess the RNA integrity.

2.3 Construction of cDNA libraries and sequencing

Two total RNA mixtures (approximately 5 μg) from the male and female tissues respectively were delivered to Gene Denovo Biotechnology Co. Ltd. (Guangzhou, China) for the construction of male and female cDNA libraries. In brief, the mRNAs were extracted from total RNA using an Oligo-dT Beads Kit (Qiagen, Hilden, Germany) and fragmented with fragmentation buffer. Then, double-stranded cDNA was synthesized using 200–700 nt fragments as templates, random primers, and a SuperScript doublestranded cDNA synthesis kit (Invitrogen, California, USA). Ligated fragments were then generated by a series of reaction processes including purification of PCR products, end repair, dA-tailing, and ligation of Illumina adapters. After agarose gel electrophoresis, suitable fragments were isolated and purified for PCR amplification and the cDNA libraries of the two sex groups were sequenced on an Illumina HiSeq™ 2000 platform (Illumina, Inc., San Diego, USA).

2.4 Trimming and de novo assembly

Quality control was performed by the SOAPnuke v1.5.0 software with the filter parameters "-l 10 -q 0.5 -n 0.05 -p 1 -i". The raw reads were filtered to generate high quality data via processes including removal of adapter sequences, reads with ambiguous sequences (N) more than 10% and low-quality sequences with percentage of low quality bases of quality value_5 was > 50% in one read (base quality score < Q20). De novo assembly of the clean reads was carried out using Trinity software (version: r20140717, http://trinityrnaseq.sourceforge.net/) with default parameters. Trinity combines three components: Inchworm, Chrysalis and Butterfly. Firstly, Inchworm assembles reads using a greedy k-mer based approach, resulting in a collection of linear contigs. Next, in Chrysalis, the abundant contigs produced by the repeated Inchworm processes are built into de Bruijn graphs through k-1 overlaps. In the third phase, Butterfly, the fragmented de Bruijn graphs are trimmed, compacted and reconciled to final linear transcripts (Grabherr et al., 2011). Finally, redundant sequences are eliminated, and the longest transcripts are defined as unigenes.

2.5 Functional annotation and classification

Functional annotation was performed by sequence comparison against public databases using BLASTx v2.2.23 (Camacho et al., 2008). All unigenes were searched against the Nr, Swiss-Prot, KOG and KEGG databases with an E-value cut-off of 1E-5. Further analysis of the annotation results was conducted using the Blast2GO software (Conesa et al., 2005) to obtain GO functional information. The WEGO statistical software was used to assay the GO functional results for GO term classification and visualization (Ye et al., 2006). In addition, the KEGG annotations were analyzed by using KOBAS v2.0 software for KEGG pathway categories (Kanehisa et al., 2004; Xie et al., 2011).

2.6 SSR loci detection

The MIcroSAtellite software package (MISA) was employed to identify SSR loci in the assembled unigene sequences (Thiel et al., 2003). The searches were carried out with a minimum of six contiguous repeat units for di-nucleotide motifs and five for all other motifs. Mono-nucleotide repeats (MNRs) were excluded because of difficulties in distinguishing genuine MNRs and a few MNRs generated by sequencing errors. Primer pairs flanking each SSR locus were designed using the Primer 5.0 software (Lalitha, 2000).

3 RESULT 3.1 Illumina sequencing and assembly

RNA-seq produced 118 510 486 raw reads with an average read size of 125 bp, encompassing 14.81 Gb sequencing data (Table 1). Quality control of the raw data resulted in 116 588 098 clean reads, which is equal to 14.57 Gb sequencing data. The Q20 and GC percentage were 98.65% and 48.45%, respectively. The high-quality clean reads were de novo assembled subsequently and generated 99 888 unigenes with a mean length of 905.75 bp and an N50 length of 1 879 bp. The unigene lengths ranged from 201 to 16 393 bp. In detail, 42 439 (42.49%) unigenes were > 500 bp in length, 23 868 (23.89%) were > 1 000 bp and 12 289 (12.30%) unigenes were > 2 000 bp in length (Fig. 1).

Table 1 Statistics for the sequencing and assembly of the spotted scat transcriptome
Figure 1 Sequence size distribution of all assembled unigenes derived from the spotted scat transcriptome
3.2 Functional annotation and classification

Among all the assembled unigenes, 44 523 (31.41%), 35 153 (27.00%), 28 100 (28.13%) and 27 183 (27.21%) unigenes showed significant similarity to proteins in the Nr, Swiss-Prot, KOG and KEGG databases, respectively (Fig. 2a). A total of 20 628 unigenes were annotated in all four databases, but 54 873 (54.93%) unigenes could not be annotated in any of them. The E-value distribution showed that 64.07% of unigenes exhibited significant homology (below 1E-50) in the Nr database and 21.71% of homologous sequences ranged from 1E-20 to 1E-50 (Fig. 2b). Moreover, 84.03% of the annotated unigenes displayed more than 70.00% similarity (Fig. 2c). These results confirmed that the transcriptome data were successfully annotated. The homologous genes came from 370 species, and the top five species showing BLASTx hits were Larimichthys crocea (19 945, 44.80%), Stegastes partitus (6 576, 14.77%), Oreochromis niloticus (2 154, 4.84%), Notothenia coriiceps (2 135, 4.80%) and Takifugu rubripes (1 295, 2.91%) (Fig. 2d). This result suggested that spotted scat is closely related to L. crocea, whose genome has been completely sequenced and published (Xiao et al., 2015).

Figure 2 Statistical summary of unigene annotation a. venn diagram of unigene annotation results against the Nr, Swiss-Prot, KOG and KEGG databases, the number in each colored block indicates the number of unigenes annotated by single or multiple databases; b. E-value distribution of BLAST hits for the unigenes in the Nr database; c. similarity (identity) distribution of BLAST hits for the unigenes in the Nr database; d. species distribution of the top ten BLAST hits for the assembled unigenes.

After Blast2GO analysis, a total of 23 783 annotated unigenes in the GO database were assigned to 56 functional groups (Fig. 3a). Of which, 60 885 (46.94%) transcripts comprised the largest category of 'biological process', followed by 'cellular component' (41 245; 31.80%) and 'molecular function" (27 570; 21.26%). The GO terms 'cellular process' (11 840; 19.45%), 'cell' (7 965; 19.31%) and 'binding' (12 598; 45.69%) were the largest groups in the three respective main categories. According to KEGG pathway analysis, 27 183 (27.21%) unigenes were assigned to five main categories, which consisted of 228 different pathways (Fig. 3b). Among the five main categories, 'metabolism' pathways were the most abundant group with 6 274 (26.15%) genes, followed by 'environmental information processing' (6 090; 25.38%), 'cellular processes' (5 127; 21.37%), 'organismal systems' (3 866; 16.11%) and 'genetic information processing' (2 639; 11.00%).

Figure 3 Gene Ontology (GO) and KEGG pathways functional classification of annotated unigenes a. GO functional classification, a total of 23 783 unigenes with significant similarity in the GO database were assigned to three main categories: cellular component, molecular function and biological progress; b. KEGG pathway assignment based on five main categories: cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems.
3.3 Identification of reproduction related genes

GO terms and KEGG pathways associated with reproduction were selected to identify candidate sequences with important reproductive functions. As a result, 604 and 1 870 transcripts annotated with reproduction-related GO terms and KEGG pathways were obtained, respectively (Table 2). GO terms under 'reproduction', 'reproductive process', 'hormone activity', 'receptor binding' and 'receptor activity' were selected. The reproduction-related pathways found in KEGG analysis were 'GnRH signaling pathway', 'progesterone mediated oocyte maturation', 'estrogen signaling pathway', 'oocyte meiosis', 'steroid biosynthesis', 'steroid hormone biosynthesis', 'retinol metabolism', 'neuroactive ligand receptor interaction' and 'neurotrophin signaling'. Details of these candidate reproduction related genes are given in Supplementary Table 1.

Table 2 Candidate transcripts related to reproduction identified from GO terms and KEGG pathways

Then, transcripts showing homology to genes associated with the gonadotropin-releasing hormone (GnRH)-gonadotropic hormone (GtH) axis, gonadal differentiation and maturation, steroid hormone biosynthesis, gametogenesis and gametic function were identified (Table 3). The most represented gene products included GnRH receptor (GNRHR), GtH, follicle-stimulating hormone (FSH), Lutropinchoriogonadotropic hormone receptor (LHCGR), doublesex and mab-3 related transcription factor (DMRT), anti-Müellerian hormone (AMH), protein fem-1 homolog (FEM1), forkhead box l2 (FOXL2), wingless-type MMTV integration site family member-4 (WNT4), cytochrome P450 family members (CYPs), StAR-related lipid transfer protein (STARD), vitellogenin (VTG) and spermatogenesis associated protein (SPATA). These gene sequences could provide valuable candidates for further functional analysis and future studies on the reproduction mechanism and its regulatory networks.

Table 3 Reproduction related genes found in the spotted scat transcriptome
3.4 Discovery and characterization of SSR loci

Using the MISA program, a total of 15 700 (15.72%) unigenes containing 22 279 SSRs were preliminarily identified, with a distribution density of one SSR per 4.06 kb (Table 4). Among these unigenes, 4 281 contained more than one SSR. The top five most abundant motifs were di-nucleotide (13 229; 59.38%), tri-nucleotide (6 022; 27.03%), tetranucleotide (2 330; 10.46%), penta-nucleotide (509; 2.28%) and hexa-nucleotide repeats (189; 0.85%). A frequency distribution analysis of the SSRs based on the motif sequence types was conducted (Fig. 4). The spotted scat transcriptome was rich in AC/GT (9 563; 42.92%), AG/CT (2 548; 11.44%) and AGG/CCT (1 852; 8.31%) repeats. The SSR primer pairs are given in Supplementary Table 2.

Table 4 Summary of the motif distribution of SSR loci identified in the spotted scat transcriptome
Figure 4 Frequency distribution of classified repeat types of SSR loci identified in the spotted scat transcriptome
4 DISCUSSION 4.1 Sequencing, assembly and annotation

To obtain a comprehensive transcriptome, RNA samples were isolated from 11 different tissues of male and female spotted scat. The total RNA samples were pooled in equal quantities to construct two cDNA libraries, and this strategy has been used in previous reports (Li et al., 2012, 2013; Lv et al., 2014; Ma et al., 2016). The mean unigene length of was similar to those obtained in other teleost fishes such as T. ovatus (1 179 bp) (Xie et al., 2014), Pelteobagrus fulvidraco (1 611 bp) (Lu et al., 2014) and S. maximus (892 bp) (Ma et al., 2016) using the Illumina platform, but significantly longer than those obtained in Salmo trutta m. trutta (668) (Malachowicz et al., 2017), Oncorhynchus mykiss (662 bp) (Salem et al., 2010) and S. maximus (671 bp) (Pereiro et al., 2012) using the 454-pyrosequencing platform. We also noted that 43% of unigenes were greater than 500 bp in length (Fig. 1). These results strongly indicate that our transcriptome data were effectively assembled. The high quality assembly may be attributed to the longer paired-end reads and the assembly program Trinity, which provides a unified solution for transcriptome reconstruction without a reference genome (Grabherr et al., 2011). However, 54.93% of the unigenes could not be annotated in any database. A high percentage of unannotated sequences is commonly caused by untranslated mRNA regions, chimeric sequences with assembly errors, non-conserved protein regions or novel genes (Ma et al., 2016). According to some previous studies, a low annotation ratio is common in non-model organisms without published genome information, especially for aquatic livestock (Jung et al., 2011; Robledo et al., 2014; Ma et al., 2012, 2016).

The BLASTx top-hit species distribution showed that approximate half of the homologous genes came from L. crocea (Fig. 2c). As one of the most economically important farmed fish species, the whole genome of L. crocea has been completely sequenced and published (Xiao et al., 2015). Therefore, it is not surprising that S. argus had a significantly higher number of similar sequences to L. crocea, in comparison with the other matched species without genomic information. Notably, most of the unigenes were matched to L. crocea, rather than other teleost fishes with whole-genome sequence data such as Cynoglossus semilaevis (Pleuronectiformes), Danio rerio (Cypriniformes), and Ctenopharyngodon idellus (Cypriniformes). These results suggest the gene content is phylogenetically conserved between S. argus (Perciformes: Scatophagidae) and L. crocea (Perciformes: Sciaenidae), and that S. argus is more closely related to L. crocea, which also belongs to the order Perciformes, than to the other sequenced teleost fishes.

4.2 Reproduction-related genes

Fish reproductive activities are basically manipulated by the GnRH-GtH axis, as GnRH acts on pituitary gonadotropes to regulate the synthesis and release of LH and FSH (Levavi-Sivan et al., 2010; Prasad et al., 2015). We identified some transcripts homologous to GNRHR, GtH2, FSH, FSH receptor and LHCGR, which are considered to be critical regulators or factors in sex hormone secretion and gonad maturation, in the transcriptome data (Table 3). GnRH agonists and exogenous GtH preparations have been widely used to induce ovulation and spermiation in farmed fish (Xie et al., 2014). To date, LHRHa (Luteinizing Hormone Releasing Hormoneanalog), LHRH-A2, 17α, 20β-dihydroxyprogesterone, Human Chorionic Gonadotropin (HCG), DOM (Domperidone) and methyl testosterone have been used in attempts to induce breeding in spotted scat (Gupta, 2016). Nevertheless, a practical artificial breeding technology has not been established. A lack of understanding of regulatory mechanisms leads to the abuse of hormones in induced spawning practices. Hence, intensive investigation on the GnRH-GtH axis is the principal task for successful artificial breeding. In addition, several former studies have shown the involvement of serotonin (5-HT) and dopamine in the regulation of the GnRH-GtH axis (Levavi-Sivan et al., 2012; Prasad et al., 2015). It is very interesting that we also found diverse 5-HT and dopamine receptors in spotted scat (Table 3). All of these GnRHGtH axis-specific sequences will greatly help to elucidate the regulatory mechanisms of the reproductive axis and improve preparation development in future research.

The major objective of induced breeding is to promote gonadal development and produce mature oocytes and motile sperms. We recognized several unigenes associated with oogenesis and oocytes maturation, such as VTG and zona pellucida spermbinding protein. VTG has a greater transcript abundance in the ovary compared with the testis and is an important protein for oocyte maturation (Patnaik et al., 2016). Zhang et al. (2013) found that formulated diets with supplementation of fish oil could effectively increase spotted scat liver vtg expression and promote ovarian development. Furthermore, we identified transcripts homologous to SPATA, Vasa, sperm flagellar protein, spermine oxidase and spermidine synthase which are potentially associated with spermatogenesis and sperm motility. Overall, in order to increase hatchability and the survival ratio, further studies focused on reproductive success and establishing highly effective techniques for the induction of gamete maturation, ovulation, and spermiation are highly encouraged.

Androgen and estrogen play critical roles via androgen receptors (ARs) and estrogen receptors (ERs) in vertebrate reproductive systems. Several types of ER but only one type of AR were found in our data (Table 3). This result indicates that spotted scat has high diversity in estrogen receptor signaling. Chen et al. (2016) reported that ARs were expressed in both male and female spotted scat and showed obvious tissue-specific expression patterns. This finding implies that ARs are multifunctional in different tissues and that appropriate AR expression in the ovary is necessary for proper ovarian development.

The development of ovaries and testes from undifferentiated gonads is regulated by sex differentiation related genes (Ma et al., 2016). Transcripts homologous to genes involved in the control of sex differentiation were discovered, such as DMRT family members, FOXL2, WNT4, CYPs, AMH and FEM1 (Table 3). Cyp11a1, cyp11b1, cyp17a1, cyp19a and cyp21a1 have crucial functions in the biosynthesis of sex steroid hormones, which are generally regarded as central to sex regulation. Moreover, foxl2 is one of the most conserved genes involved in the differentiation and maintenance of ovaries in vertebrates, and generally participates in ovarian development as the activating transcription factor of cyp19a (Ma et al., 2016). In ovaries and testes of spotted scat, foxl2 and cyp19a1 exhibited coordinate expression patterns at different development stages (Liu et al., 2015), suggesting that foxl2 may regulate cyp19a1 expression. Coincidentally, there was a significant positive correlation between foxl2 mRNA and serum 17β-estradiol levels (Li et al., 2015). WNT4 plays a crucial role in ovarian differentiation and development in mammals but its function in teleost fish remains unclear. Two wnt4 subtypes, wnt4a and wnt4b, have been identified in spotted scat. Wnt4a might be involved in gonad development and play a role in spermatogonial proliferation, while wnt4b might only be involved in ovary development (Chen et al., 2016). Sex-determining region Y box 9 (sox9) is an important male sex-determination gene associated with testis development in vertebrates (Morrish and Sinclair, 2002). We found sox3 in our data but sox9 was absent, possibly because sox9 expression is significantly downregulated after the completion of gonadal differentiation. In accordance with this prediction, sox9 expression was significantly up-regulated in the early stages of masculinization induced by an aromatase inhibitor (Chen et al., 2015b), indicating that sox9 might be one of the most pivotal factors involved in sex determination in spotted scat.

Because of their markedly faster growth rate, female monosexual farming would be more efficient in spotted scat fishery (Chen et al., 2016). Because spotted scat fry have no obvious sexual morphological differences, sex-specific molecular markers are a prerequisite for sex identification in fry breeding. As thus, studying the roles of sex differentiation-related genes in sex determination will greatly help to isolate sex-specific markers (Xie et al., 2014). DMRT1 was confirmed to be the master regulator of male gonad differentiation and has attracted considerable interest (Herpin and Schartl, 2011). Three DMRT family members, dmrt1b, dmrt2 and dmrt4, have been recognized, but their sex-regulating effects in spotted scat are still unknown. Additionally, the functions of sex-related genes such as amh, fem1 and wt-1 have also been rarely investigated and are poorly understood. Hence, further molecular biology and biochemical studies are required to clarify the sexual manipulation roles of these genes and confirm the sex determining gene.

4.3 SSR loci discovery

SSRs, tandemly repeated motifs with high-levels of polymorphism, are widely used as molecular markers in population genetics, conservation genetics, genetic linkage mapping and quantitative trait locus analysis (Patnaik et al., 2016). SSR repeat types generally exhibit significant species-specific differences in fish, molluscs and crustaceans (Jung et al., 2011; Ma et al., 2012; Tong et al., 2015). The most abundant loci in spotted scat were di-nucleotide (AC/ GT, AG/CT) and tri-nucleotide repeats (AGG/CCT) (Fig. 4). The Cristaria plicata transcriptome is also rich in AC/GT (28.94%) motifs (Patnaik et al., 2016), but has a much lower ratio than that in spotted scat. Furthermore, the most frequent motifs among both diand trinucleotide repeats were obviously different from those in T. ovatus, in which AT/GT and ACG/ CTG accounted for 43.5% and 10.2% of motifs, respectively (Xie et al., 2014). SSR markers have become an effective approach to improve the production of commercially important fish species (Liu et al., 2013). To date, a few polymorphic SSR makers have been developed in spotted scat (Liu et al., 2010, 2013), but the amount is still far from adequate. The SSR loci identified here could be a powerful tool for future studies on genetics and MAS as potential molecular markers.

5 CONCLUSION

In this study, we used Illumina high-throughput sequencing to analyze the transcriptome of spotted scat. The transcript dataset presented here greatly increases the genetic information available for general gene expression studies and contributes valuable sequence resources for functional gene excavation and analysis. A number of candidate functional genes involved in reproduction processes would make good starting points for future studies on the regulatory mechanisms of the GnRH-GtH axis, sex differentiation, gonadal development, gametogenesis, oocyte maturation and sperm motility. The sex differentiation-related transcripts we identified will be useful for identifying sex-determining genes and isolating sex-specific markers. Finally, the massive amount of SSRs we generated will provide abundant marker resources for future research into genetics, MAS and conservation biology.

6 AUTHOR CONTRIBUTION

Conceived and designed the experiments: HC and WY. Performed the experiments: XC and KZ. Analyzed the data: WY and HC. Contributed reagents/ materials/analysis tools: DJ, SD and CZ. Wrote the manuscript: WY and HC. Overall monitoring, manuscript overview and editing: GL and CZ.

7 DATA AVAILABILITY STATEMENT

The original Illumina HiSeq sequencing data are available from the NCBI Sequence Read Archive (SRA) database under the accessions SRX2499155 (male) and SRX2499156 (female). Other data that support the findings of this study are available from the corresponding author upon reasonable request.

Electronic supplementary material

Supplementary material (Supplementary Tables 1-2) is available in the online version of this article at https://doi.org/10.1007/s00343-018-7090-0.

References
Barry T P, Castanos M T, Fast A W. 1991. Induced spermiation in the male spotted scat (Scatophagus argus) by long-term administration of 17α-methyltestosterone followed by LHRHa. Asian Fish. Sci., 4(2): 137-145.
Barry T P, Castanos M T, Macahilig M P S C, Fast A W. 1993. Gonadal maturation and spawning induction in female spotted scat (Scatophagus argus). J. Aqua. Trop., 8: 121-130.
Barry T P, Fast A W. 1992. Biology of the spotted scat(Scatophagus argus) in the Philippines. Asian Fish. Sci., 5(3): 163-179.
Cai Z P, Wang Y, Hu J W, Zhang J B, Lin Y G. 2010. Reproductive biology of Scatophagus argus and artificial induction of spawning. J. Trop. Oceanogr., 29(5): 180-185. (in Chinese with English abstract)
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos P, Bealer K, Madden T L. 2009. BLAST+:architecture and applications. BMC Bioinformatics, 10: 421. DOI:10.1186/1471-2105-10-421
Chen H P, Deng S P, Dai M L, Zhu C H, Li G L. 2016. Molecular cloning, characterization, and expression profiles of androgen receptors in spotted scat (Scatophagus argus). Genet. Mol. Res., 15(2): 1-14. DOI:10.4238/gmr.15027838
Chen J H, He M X, Mu X J, Yan B L, Zhang J B, Jin S C. 2015b. cDNA cloning and mRNA expression analysis of Sox9 in Scatophagus argus. Chin. J. Zool., 50(1): 93-102. (in Chinese with English abstract)
Chen J H, He M X, Yan B L, Zhang J B, Jin S C, Liu L. 2015a. Molecular characterization of dax1 and SF-1 and their expression analysis during sex reversal in spotted scat, Scatophagus argus. J. World Aquac. Soc., 46(1): 1-19. DOI:10.1111/jwas.2015.46.issue-1
Conesa A, Götz S, García-Gómez J M, Terol J, Talón M, Robles M. 2005. Blast2GO:a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21(18): 3 674-3 676. DOI:10.1093/bioinformatics/bti610
Deng S P, Wu B, Zhu C H, Li G L. 2014. Molecular cloning and dimorphic expression of growth hormone (gh) in female and male spotted scat Scatophagus argus. Fisheries Sci., 80(4): 715-723. DOI:10.1007/s12562-014-0763-5
Finseth F R, Harrison R G. 2014. A comparison of next generation sequencing technologies for transcriptome assembly and utility for RNA-Seq in a non-model bird. PLoS One, 9(10): e108550. DOI:10.1371/journal.pone.0108550
Fraser B A, Weadick C J, Janowitz I, Rodd F H, Hughes K A. 2011. Sequencing and characterization of the guppy(Poecilia reticulata) transcriptome. BMC Genomics, 12: 202. DOI:10.1186/1471-2164-12-202
Gandhi V, Venkatesan V, Ramamoorthy N. 2014. Reproductive biology of the spotted scat Scatophagus argus (Linnaeus, 1766) from Mandapam waters, South-east coast of India. Indian J. Fish., 61(4): 55-59.
Garber M, Grabherr M G, Guttman M, Trapnell C. 2011. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods, 8(6): 469-477. DOI:10.1038/nmeth.1613
Grabherr M G, Haas B J, Yassour M, Levin J Z, Thompson D A, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q D, Chen Z H, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren B W, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29(7): 644-652. DOI:10.1038/nbt.1883
Gupta S. 2016. An overview on morphology, biology, and culture of spotted scat Scatophagus argus (Linnaeus 1766). Rev. Fish. Sci. Aquac., 24(2): 203-212. DOI:10.1080/23308249.2015.1119800
Herpin A, Schartl M. 2011. Dmrt1 genes at the crossroads:a widespread and central class of sexual development factors in fish. FEBS J., 278(7): 1.
Jung H, Lyons R E, Dinh H, Hurwood D A, McWilliam S, Mather P B. 2011. Transcriptomics of a giant freshwater prawn (Macrobrachium rosenbergii):de novo assembly, annotation and marker discovery. PLoS One, 6(12): e27938. DOI:10.1371/journal.pone.0027938
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res., 32(Suppl 1): D277-D280.
Lalitha S. 2000. Primer premier 5. Biotech Softw. Internet Rep., 1(6): 270-272.
Levavi-Sivan B, Bogerd J, Mañanós E L, Gómez A, Lareyre J J. 2010. Perspectives on fish gonadotropins and their receptors. Gen. Comp. Endocrinol., 165(3): 412-437. DOI:10.1016/j.ygcen.2009.07.019
Levavi-Sivan B, Safarian H, Rosenfeld H, Elizur A, Avitan A. 2012. Regulation of gonadotropin-releasing hormone(GnRH)-receptor gene expression in tilapia:effect of GnRH and dopamine. Biol. Reprod., 70(6): 1 545-1 551.
Li C Z, Weng S P, Chen Y G, Yu X Q, Lü L, Zhang H Q, He J G, Xu X P. 2012. Analysis of Litopenaeus vannamei transcriptome using the next-generation DNA sequencing technique. PLoS One, 7(10): e47442. DOI:10.1371/journal.pone.0047442
Li G L, Zhang M Z, Deng S P, Chen H P, Zhu C H. 2015. Effects of temperature and fish oil supplementation on ovarian development and foxl2 mRNA expression in spotted scat Scatophagus argus. J. Fish Biol., 86(1): 248-260. DOI:10.1111/jfb.2015.86.issue-1
Li S H, Zhang X J, Sun Z, Li F H, Xiang J H. 2013. Transcriptome analysis on Chinese shrimp Fenneropenaeus chinensis during WSSV acute infection. PLoS One, 8(3): e58627. DOI:10.1371/journal.pone.0058627
Li S Q, Liu Z H, Hu P, Liang X M, Liu H F, Su M L, Zhang J B. 2014. SNP discovery using high-throughput 454 pyrosequencing and validation in the spotted scat, Scatophagus argus. Conserv. Genet. Resour., 6(4): 817-820. DOI:10.1007/s12686-014-0255-z
Liu H F, Li S Q, Hu P, Zhang Y Y, Zhang J B. 2013. Isolation and characterization of EST-based microsatellite markers for Scatophagus argus based on transcriptome analysis. Conserv. Genet. Resour., 5(2): 483-485. DOI:10.1007/s12686-012-9833-0
Liu H F, Mu X J, Gui L, Su M L, Li H, Zhang G, Liu Z H, Zhang J B. 2015. Characterization and gonadal expression of foxl2 relative to cyp19a genes in spotted scat Scatophagus argus. Gene, 561(1): 6-14. DOI:10.1016/j.gene.2014.12.060
Liu H F, Zhang J B, Cai Z P, Song Y. 2010. Novel polymorphic microsatellite loci for the spotted scat Scatophagus argus. Conserv. Genet. Resour., 2(2): 149-151.
Lu J G, Luan P X, Zhang X F, Xue S Q, Peng L N, Mahbooband S, Sun X W. 2014. Gonadal transcriptomic analysis of yellow catfish (Pelteobagrus fulvidraco):identification of sex-related genes and genetic markers. Physiol. Genomics, 46(21): 798-807. DOI:10.1152/physiolgenomics.00088.2014
Lv J J, Liu P, Gao B Q, Wang Y, Wang Z, Chen P, Li J. 2014. Transcriptome analysis of the Portunus trituberculatus:de novo assembly, growth-related gene identification and marker discovery. PLoS One, 9(4): e94055. DOI:10.1371/journal.pone.0094055
Ma D Y, Ma A J, Huang Z H, Wang G N, Wang T, Xia D D, Ma B H. 2016. Transcriptome analysis for identification of genes related to gonad differentiation, growth, immune response and marker discovery in the turbot (Scophthalmus maximus). PLoS One, 11(2): e0149414. DOI:10.1371/journal.pone.0149414
Ma K Y, Qiu G F, Feng J B, Li J L. 2012. Transcriptome analysis of the oriental river prawn, Macrobrachium nipponense using 454 pyrosequencing for discovery of genes and markers. PLoS One, 7(6): e39727. DOI:10.1371/journal.pone.0039727
Malachowicz M, Wenne R, Burzynski A. 2017. De novo assembly of the sea trout (Salmo trutta m trutta) skin transcriptome to identify putative genes involved in the immune response and epidermal mucus secretion. PLoS One, 12(2): e0172282. DOI:10.1371/journal.pone.0172282
Morrish B C, Sinclair A H. 2002. Vertebrate sex determination:many means to an end. Reproduction, 124(4): 447-457. DOI:10.1530/rep.0.1240447
Patnaik B B, Wang T H, Kang S W, Hwang H J, Park S Y, Park E B, Chung J M, Song D K, Kim C, Kim S, Lee J S, Han Y S, Park H S, Lee Y S. 2016. Sequencing, de novo assembly, and annotation of the transcriptome of the endangered freshwater pearl bivalve, Cristaria plicata, provides novel insights into functional genes and marker discovery. PLoS One, 11(2): e0148622. DOI:10.1371/journal.pone.0148622
Pereiro P, Balseiro P, Romero A, Dios S, Forn-Cuni G, Fuste B, Planas J V, Beltran S, Novoa B, Figueras A. 2012. Highthroughput sequence analysis of turbot (Scophthalmus maximus) transcriptome using 454-pyrosequencing for the discovery of antiviral immune genes. PLoS One, 7(5): e35369. DOI:10.1371/journal.pone.0035369
Prasad P, Ogawa S, Parhar I S. 2015. Serotonin reuptake inhibitor citalopram inhibits GnRH synthesis and spermatogenesis in the male zebrafish. Biol. Reprod., 93(4): 102.
Robledo D, Ronza P, Harrison P W, Losada A P, Bermúdez R, Pardo B G, Redondo M J, Sitjà-Bobadilla A, Quiroga M I, Martínez P. 2014. RNA-seq analysis reveals significant transcriptome changes in turbot (Scophthalmus maximus)suffering severe enteromyxosis. BMC Genomics, 15: 1149. DOI:10.1186/1471-2164-15-1149
Salem M, Rexroad C E, Wang J N, Thorgaard G H, Yao J B. 2010. Characterization of the rainbow trout transcriptome using Sanger and 454-pyrosequencing approaches. BMC Genomics, 11: 564. DOI:10.1186/1471-2164-11-564
Thiel T, Michalek W, Varshney R, Graner A. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet., 106(3): 411-422. DOI:10.1007/s00122-002-1031-0
Tong Y, Zhang Y, Huang J M, Xiao S, Zhang Y H, Li J, Chen J H, Yu Z N. 2015. Transcriptomics analysis of Crassostrea hongkongensis for the discovery of reproduction-related genes. PLoS One, 10(8): e0134280. DOI:10.1371/journal.pone.0134280
Xiao S J, Li J T, Ma F S, Fang L J, Xu S B, Chen W, Wang Z Y. 2015. Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the wholegenome mapping in BioNano Genomics Irys system. BMC Genomics, 16: 670. DOI:10.1186/s12864-015-1871-z
Xie C, Mao X Z, Huang J J, Ding Y, Wu J M, Dong S, Kong L, Gao G, Li C Y, Wei L P. 2011. KOBAS 2.0:a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res., 39: W316-W322. DOI:10.1093/nar/gkr483
Xie Z Z, Xiao L, Wang D D, Fang C, Liu Q Y, Li Z H, Liu X C, Zhang Y, Li S S, Lin H R. 2014. Transcriptome analysis of the Trachinotus ovatus:identification of reproduction, growth and immune-related genes and microsatellite markers. PLoS One, 9(10): e109419. DOI:10.1371/journal.pone.0109419
Ye J, Fang L, Zheng H K, Zhang Y, Chen J, Zhang Z J, Wang J, Li S T, Li R Q, Bolund L, Wang J. 2006. WEGO:a web tool for plotting GO annotations. Nucleic Acids Res., 34(Suppl 2): W293-W297.
Zhang M Z, Li G L, Zhu C H, Deng S P. 2013. Effects of fish oil on ovarian development in spotted scat (Scatophagus argus). Anim. Reprod. Sci., 141(1-2): 90-97. DOI:10.1016/j.anireprosci.2013.06.020