Sox genes, characterized by the conserved high mobility group (HMG) box, encode a class of transcription factors with high sequence similarity to sex determining region of Y chromosome (SRY) gene in animals. Since 1990, when the first Sry gene was identified in human and mouse, more than 100 sox genes have been discovered in mammals, birds, reptiles, fishes, and insects (Sinclair et al., 1990; Gao et al., 2016). According to the conservatism of protein and nucleic acid sequences, sox genes are named as sox1 to sox32, and classified as 12 subfamilies (A, B1, B2, C, D, E, F, G, H, I, J, and K).
With the development of genome-wide sequencing, an increasing number of sox genes were identified from different species, for instance, five in nematode (Caenorhabditis elegans) (C. elegans Sequencing Consortium, 1998), seven in calcareous sponge (Sycon ciliatum) (Fortunato et al., 2012), eight in fruitfly (Drosophila melanogaster) (Crémazy et al., 2001) and African malaria mosquito (Anopheles gambiae) (Wilson and Dearden, 2008), nine in honey bee (Apis mellifera), red flour beetle (Tribolium castaneum) and jewel wasp (Nasonia vitripennis) (Wilson and Dearden, 2008), 14 in starlet sea anemone (Nematostella vectensis) (Magie et al., 2005), 19 in Japanese medaka (Oryzias latipes) (Cui et al., 2011), 20 in mouse (Mus musculus) and human (Homo sapiens) (Schepers et al., 2002), 23 in tongue sole (Cynoglossus semilaevis) (Gao et al., 2016), 24 in torafugu (Takifugu rubripes) (Koopman et al., 2004), and 27 in Nile tilapia (Oreochromis niloticus) (Wei et al., 2016) (Table 1). Genome-wide comparison has provided valuable information to understand the function and evolution of sox genes. With the advancing knowledge on sox gene family, researchers have discovered the essential function of sox genes as important transcription factors in the regulation of diverse growth and development processes. For instance, sox gene knockout and mutation have revealed that the function of sox genes in many aspects, including chondrogenesis (Ng et al., 1997), neurogenesis (Wegner, 2011), early embryonic development (Kikuchi et al., 2001), hematopoiesis (Chung et al., 2010), stemness (Tanimura et al., 2013), angiogenesis (Pennisi et al., 2000; Downes and Koopman, 2001; Matsui et al., 2006), cardiogenesis (Zhang et al., 2005), hair development (Irrthum et al., 2003), and sex determination and differentiation (Foster et al., 1994; Vidal et al., 2001). Specifically, sox9 has been confirmed to be an essential factor for the proper proliferation and survival of medaka germ cells (Nakamura et al., 2012). Sox7 and sox18 play redundant roles in vascular development and arteriovenous specification in zebrafish (Cermenati et al., 2008; Herpers et al., 2008). Sox17 has unique effects on primitive erythropoiesis in zebrafish (Chung et al., 2010).
Japanese flounder (Paralichthys olivaceus) is a valuable economic teleost in China, Japan, and Korea. Recent developments in genome and transcriptome sequencing techniques provide a better approach to identify Japanese flounder sox genes at a genome scale level. In the present study, we used the genome and transcriptome sequencing to identify sox genes in this species. Gene structure and expression profiles of sox genes were also analyzed. Our results provided a better way to further investigate the evolution and function of sox gene family in Japanese flounder.2 MATERIAL AND METHOD 2.1 Ethics statement
The Japanese flounder individuals used in this study were obtained from Yellow Sea Aquatic Product Co. Ltd., Yantai, Shandong, China. All experimental procedures and investigations were supervised and approved by Ocean University of China, and were performed in accordance with the guidelines of China Government Principles for the Utilization and Care of Vertebrate Animals Used in Testing, Research, and Training (State science and technology commission of the People's Republic of China for No. 2, November 14, 1988).2.2 Fish and sample collection
Japanese flounder individuals (three females and three males) of one year old were chosen from a larger cohort population. The fish were anesthetized with MS-222 (30 mg/mL) and killed by severing the spinal cord. Tissue samples, including heart, liver, spleen, kidney, brain, gill, muscle, intestines, stomach, testis, and ovary, were collected respectively frozen immediately in liquid nitrogen, and then stored in -80℃ for RNA extraction.
Embryos of different development stages were collected in the breeding season (late April). In order to facilitate sampling, we divided the embryo development process of Japanese flounder into six stages, namely, Stage 1 (from two cells to morula), Stage 2 (from early gastrula to late somites), Stage 3 (from hatching to 2 d after hatching), Stage 4 (before metamorphosis), Stage 5 (metamorphosis stage 1 to 2), and Stage 6 (metamorphosis stage 3 to 5). Samples of different stages were collected respectively and stored in -80℃ for RNA extraction.2.3 RNA extraction and illumina sequencing
Total RNA was extracted from tissue and development stage samples using Trizol reagent (Invitrogen, Carlsbad CA, USA) according to the manufacturer's protocol. DNA contamination was removed by DNaseI (TaKaRa, Dalian, China). The high-quality RNA from each sample was used to construct the llumina sequencing libraries by Illumina TruSeq mRNA Stranded Sample Preparation Kit (Illumina, San Diego CA, USA) according to the manufacturer's protocol and a previous study (Zhang et al., 2016). All constructed cDNA library were quenched by Beijing Genomics Institute (BGI, Shenzhen, China). Base on the acquired sequence reads, the fragments per kilobase of exon model per million mapped reads (FPKM) was calculated to measure gene expression levels (Garber et al., 2011; Trapnell et al., 2012) according to the formula as follows (Weitschek et al., 2015):
where Nf is the count of mapped fragments, F is the total number of the mapped fragments, and L is the length (base pairs) of all exons of sox gene.2.4 Identification of the Sox genes
In order to perform the analysis in a more comprehensive way, two procedures were used to identify the candidate sox genes in Japanese flounder genome and transcriptome. Sox protein sequences of human (Homo sapiens), mouse (Mus musculus), zebrafish (Danio rerio), fugu (Takifugu rubripes), and tilapia (Oreochromis niloticus) were retrieved from Ensemble (http://asia.ensembl.org/index.html) and NCBI (https://www.ncbi.nlm.nih.gov/) databases. All these Sox protein-coding sequences were queried against the Japanese flounder genome (Zhang, 2016, unpublished data) and transcriptome (Zhang, 2016, unpublished data) by local TBlastx search with the threshold level of E-value of 1e-5 (Altschul et al., 1997). Meanwhile, the conserved HMG box sequence of vertebrate Sox proteins (DHVKRPMNAFMVWS-RGERRKIAQQNPDMHNSEISKRLGKRWKLLS-ESEKRPFIEEAERLRAQHMKDYPDYKYRPRR-KKK) (Wang et al., 2002) was used as a query sequence in local TBlastn search against the Japanese flounder genome and transcriptome with the threshold level of E-value of 1e-5. The results of these two procedures were integrated together, and the Blast search (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and phylogenetic analysis were used to assess the accuracy of the candidate sox genes. Besides, the exon-intron structure of Japanese flounder sox genes were mapped by Blastn program.2.5 Phylogenetic analysis of Sox gene
Before phylogenetic analysis, the Sox protein sequences of medaka (Oryzias latipes), spotted gar (Lepisosteus oculatus), and fruit fly (Drosophila melanogaster) were retrieved from Ensemble and NCBI. Sox proteins sequences of nine species (human, mouse, fruit fly, fugu, zebrafish, tilapia, medaka, spotted gar, and Japanese flounder) were used to construct the phylogenetic tree. The accession numbers of these protein were available in Table S1. Multiple Sequence alignment of all the Sox protein sequences were implemented by ClustalW before evolutionary tree analysis. The phylogenetic tree was constructed by the neighbor-joining method with Poisson model implemented in MEGA 7.0 program with bootstrap of 10 000 replications. Specifically, the evolutionary distances were computed using the Poisson correction method and all positions containing gaps and missing data were eliminated (Zuckerkandl and Pauling, 1965; Saitou and Nei, 1987; Kumar et al., 2016).2.6 Analysis of Sox gene expression base on transcriptome
The transcriptomes of eleven tissues (heart, liver, spleen, kidney, brain, gill, muscle, intestines, stomach, testis, and ovary) and six embryonic development stages (Stage 1–6) were used to analyze sox gene expression in Japanese flounder. The sox gene expression levels in different tissues and stages were evaluated by normalized FPKM value. For sox gene expression in different tissues and embryonic development stages, FPKM≥1 was considered reasonable, and FPKM≥10 was considered as a threshold for high expression (Hart et al., 2013; Tsagaratou et al., 2014).
The biased/specific expression of sox genes in gonads (testis or ovary) was also analyzed based on RNA-Seq data. The testis/ovary-biased expressed candidate genes were identified when "FPKM≥1" and "|log2(FPKMovary/FPKMtestis)|≥2". The testis/ovary-specific expressed genes were identified when "FPKM≥3" in one gonads, but "FPKM≤1" in the other.2.7 Sox gene expression quantified by qPCR
Extracted total RNA was transcribed by M-MLV Reverse Transcriptase (TaKaRa) enzyme according to the manufacturer's protocol. The primers used in this experiment were designed by Primer Primer 5.0 and listed in Table S2. Pre-experiment was conducted to test the specificity of primers. qPCR was performed in Light-Cycler 480 (Roche, Forrentrasse, Switzerland) with in 20 μL reaction volume, which contained cDNA templates 10 ng, primers (FW/RV) and 2×SYBR Green qPCR Master Mix (US Everbright Inc.). β-actin was selected as a reference gene (Zhang et al., 2013). The reaction procedure consisted of an initial polymerase activation of 5 min at 94℃, followed by 40 cycles at 94℃ (15 s) and 60℃ (45 s). The data were analyzed by the 2-ΔΔCt method. This experiment was performed according to a previous study (Gao et al., 2015).3 RESULT 3.1 Sox gene subfamilies in Japanese flounder
Based on the relatively conserved HMG box domain of vertebrate Sox proteins (Bowles et al., 2000), the conserved HMG-box domain of vertebrate and Sox protein-coding sequences of five species were used to search against the Japanese flounder genome and transcriptome by Local Alignment Search Tool (BLAST+ 2.6.0). A total of 25 sox genes were isolated and identified from Japanese flounder genome which could be divided into seven subfamilies (Table 2), that is, subfamily B1 (including Posox1a, Posox1b, Posox2, Posox3, and Posox19), subfamily B2 (including Posox14 and Posox21), subfamily C (including Posox4a, Posox4b, Posox11a, and Posox11b), subfamily D (including Posox5, Posox6a, Posox6b, and Posox13), subfamily E (including Posox8a, Posox8b, Posox9a, Posox9b, Posox10a, and Posox10b), subfamily F (including Posox7, Posox17, and Posox18), subfamily K (including Posox32). Compare with human and mouse, which had only one sox gene copy, some teleost sox genes had two duplicates (except spotted gar (Hermansen et al., 2016)) (Table 3). These results suggested that these novel sox isoforms might derive from a teleost-specific evolutionary process.3.2 Sox protein domain in Japanese flounder
The 25 Sox protein sequences identified from Japanese flounder were used to perform online protein sequence analysis by SMART (http://smart.embl-heidelberg.de/). The results showed that all the Sox proteins had a conserved HMG box of 79 amino acid residues (Gubbay et al., 1990). As shown in Fig. 1, both the motif sequence (positions 5-10) and the extended motif sequence (position 5-13) were highly conserved for all Sox sequences except PoSox32. Besides, some other fragments in the HMG box were also highly conserved. Interestingly, similar to Sox32 in medaka (Cui et al., 2011), and tongue sole (Gao et al., 2016), Japanese flounder Sox32 could be distinguished from other Sox proteins. For example, the amino acid at position 7 in PoSox32 motif was L, but it was M in the other Sox sequences. Moreover, residues at positions 11-13 in PoSox32 HMG box were IIW, which were also different from the highly conserved MVW of other Sox proteins in these positions. These results indicated that the function and evolution of Sox32 in teleosts might be more complex compared with other vertebrates.