Journal of Oceanology and Limnology   2019, Vol. 37 issue(3): 1071-1079     PDF
Institute of Oceanology, Chinese Academy of Sciences

Article Information

SHANGGUAN Jingbo, XU Anle, HU Xiaowei, LI Zhongbao
Isolation and characterization of genic microsatellites from de novo assembly transcriptome in the bivalve Ruditapes philippinarum
Journal of Oceanology and Limnology, 37(3): 1071-1079

Article History

Received Mar. 28, 2018
accepted in principle Jun. 12, 2018
accepted for publication Jul. 6, 2018
Isolation and characterization of genic microsatellites from de novo assembly transcriptome in the bivalve Ruditapes philippinarum
SHANGGUAN Jingbo1,2, XU Anle1,2, HU Xiaowei1,2, LI Zhongbao1,2     
1 Fujian Provincial Key Laboratory of Marine Fishery Resources and Eco-Environment, Xiamen 361021, China;
2 Fisheries College, Jimei University, Xiamen 361021, China
Abstract: The marine bivalve Ruditapes philippinarum (Veneridae) has always been an economically important aquaculture species. In this study, 106 831 unigenes and 2 664 SSR loci (1 locus/40 sequences) were achieved from the de novo assembly transcriptome. Among all the SSRs, tri-nucleotides (46.40%) was the most, followed by di-nucleotides (32.43%). Meanwhile, AAC/GTT (19.82%) was the most common SSR loci searched. After polymorphism detection using 32 wild R. philippinarum individuals, 34 polymorphic and 3 monomorphic SSR loci were screened, and the genetic index of them was calculated. The results show that PIC of 30 polymorphic SSR loci was at medium and high levels (PIC>0.25). However, there were five SSR polymorphic loci (e.g. MG871423, MG871428, MG871429, MG871434, MG871435) deviating from the Hardy-Weinberg equilibrium after the Bonferroni correction (adjusted P=0.001 471). The Na value (number of alleles per locus) ranged from 2 to 7. In addition, the Ho (observed heterozygosities) and He (expected heterozygosities) were 0.100 0-1.000 0 and 0.191 3-0.723 6, respectively. Therefore, RNA-Seq was shown as a fast and cost-effective method for genic SSR development in non-model species. Meanwhile, the 37 loci from R. philippinarum will further enrich the genetic information and advance the population conservation and restoration.
Keywords: Ruditapes philippinarum    transcriptome    microsatellite    genetic diversity    

The Manila clam, Ruditapes philippinarum (Adams et Reeve, 1850), is an economically important bivalve mollusk. Except for the vitally edible value, the species also plays the role as a model organism to coastal pollution (Hégaret et al., 2007; Cong et al., 2018). The traits of eurythermic and euryhaline make the species widely distributed in the west coast of the Pacific from Russia to the Philippines, especially along the estuaries and near coastal waters in China (Wu et al., 2011; FAO, 2014). However, in recent years, with the rapid development of the economy, the clam resources and germplasm are declining resulted from beach reutilization, overfishing, environment pollution, and destruction, etc. Additionally, the two to three weeks of the planktonic larval stage prompts its larvae to disperse to new habitats, even settle down, and breed (Yasuda et al., 2007). If so, the population structure may make a difference over time.

Highly polymorphic genetic markers can provide an effective way to reveal the population structure and promote heredity development (Yasuda et al., 2007). Up to now, microsatellite marker, i.e. simple sequence repeat (SSR), is one more popular molecular marker technology for genetic research, because of its abundance, co-dominance, highly allelic variability and wide dispersal throughout the genome (Chistiakov et al., 2006). Furtherly, there were generally two types of SSRs—genomic SSR and genic SSR, in terms of their correspondingly original sequences (Ellis and Burke, 2007). Genomic SSRs are not only obtained from genomic high-throughput sequencing but also from SSRs library construction by magnetic beads enrichment, which perhaps had certain inefficiency and limitations with different biotin probes. Nevertheless, it is reported that genic SSRs are more conservative and have a relatively higher transferability in related or cryptic species than genomic SSRs (Varshney et al., 2005a; Yamini et al., 2013). In other words, genomic SSRs are more polymorphic than genic SSRs (Ellis and Burke, 2007; Kong et al., 2014). In aquaculture, some reports about SSRs in the field of genetic research are reported, for example, pearl mussel Hyriopsis cumingii (Bai et al., 2013), ridge-tail white prawn Exopalaemon carinicauda (Li et al., 2015), schilbid catfish Silonia silondia (Mandal et al., 2016) and pen shell Atrina pectinata (Sun et al., 2017), etc. For R. philippinarum, there have been many SSR loci obtained from expressed sequence tag (EST) sequences (Nie et al., 2014, 2015; Yan et al., 2015; Zhu et al., 2015). In any event, the high-throughput sequencing has made enormous contributions to important organisms, for example, non-model or rare species in terms of population genetic information.

In this study, high-throughput sequencing was carried out on the Illumina platform to obtain the transcriptome of R. philippinarum, and then a de novo assembly was conducted to assemble the short clean reads. The characterization of all genic SSRs was analyzed from all perspectives including SSR types and distribution percentages. Finally, we screened randomly 34 polymorphic and 3 monomorphic SSR loci. The results will promote conservation and breeding studies of the clam species.

2 MATERIAL AND METHOD 2.1 Specimen preparation

Healthy and adult samples used for cDNA library construction were purchased from the aquatic wholesale market (Gaoqi, Xiamen, China) in April 2017. The gill tissues for RNA extraction were dissected and quick-frozen in liquid nitrogen. In addition, 32 wild and living individuals from coastal waters (Jimei, Xiamen, China) in November 2017 were used to detect subsequent population genetics and polymorphism. All the fresh samples were then stored at -80℃.

2.2 cDNA library construction and SSR-enriched sequences obtainment

Total RNA was extracted using TRIzol reagent (Life Technologies, CA, USA) according to the Product Manual. The cDNA library construction and RNA-Seq were finished by Gene Denovo Biotechnology Co. (Guangzhou, China) with the platform Illumina HiSeqTM 4000. Clean reads were achieved by removing low-quality sequences from raw data: (1) reads containing adapters; (2) reads containing more than 10% of unknown nucleotides (N); and (3) reads containing more than 50% of low quality (Q value ≤10) bases. Next, all unigenes were obtained by de novo assembling clean reads with the Trinity program (Grabherr et al., 2011). Microsatellites were searched and located among all the unigenes of transcriptome with the software microsatellite identification tool (MISA,, and the parameter settings were as follows: (1) definition (unit size, min repeats): 2-6 3-5 4-4 5-4 6-4; (2) interruptions (max difference between 2 SSRs): 100 bp. Meanwhile, the statistical analysis of SSR category and corresponding characterization were performed.

2.3 Specific primers of SSR loci screening and polymorphism detection

Two hundred and thirty-four SSR sequences were randomly selected and 100 primers designed with Primer Premier 5.0.32. Genomic DNA of 32 wild individuals was extracted using the cell/tissue genomic DNA extraction kit (GENErayTM Biotechnology, Shanghai, China). The DNA pool with 10 high-quality samples were applied to test the primers' specificity and the optimum annealing temperature (Ta) with the gradient polymerase chain reaction (PCR) process: 1 cycle (95℃ predegeneration for 5 min); 30 cycles (45 s at 94℃ degeneration, 30 s at 60–40℃ gradient annealing, 45 s at 72℃ extending); 1 cycle (72℃ extending for 10 min). Moreover, the PCR amplification was realized by a volume of 10-μL reaction system with 2*Taq Master Mix (Novoprotein Scientific, Shanghai, China).

Next, the polymorphism of the above specific primers were examined using a set of 32 DNA samples in touchdown-PCR method as follows: 1 cycle (95℃ for 5 min); 15 cycles (94℃ for 45 s; from Ta+15℃, -1℃/cycle for 30 s; 72℃ for 45 s); 20 cycles (94℃ for 45 s; 30 s at Ta; 72℃ for 45 s); 1 cycle (72℃ for 10 min). Finally, the 32 PCR amplifications of every specific primer were separated and detected by 6% polyacrylamide gel electrophoresis on the Sequi-Gen Sequencing Cell (Bio-Rad, USA), followed by silver staining.

2.4 Genetic date statistics

The Micro-Checker was used to inspect null alleles (van Oosterhout et al., 2004; Wen et al., 2013). Some population genetic information and index (PopGene 32, Version 1.32) (Yeh et al., 2000) were used to calculate the Hardy-Weinberg equilibrium (P-HWE), linkage disequilibrium, the number of alleles per locus (Na), observed heterozygosities (Ho) and expected heterozygosities (He). Cervus 3.0.3 (Marshall et al., 1998) was used to estimate the polymorphism information content (PIC).

3 RESULT 3.1 Characterization of SSR-enriched sequences from transcriptome

After sequencing and de novo assembly, 106 831 unigenes and 58 273 004 base-pairs were obtained. Among them 2 428 sequences containing 2 664 SSRs identified as shown in Table 1. The results show that there is 1 microsatellite locus in every ~40 sequences or every ~21 874 bp. According to the different repeat units, these 2 664 SSRs can be categorized into dinucleotides (864, 32.43%), tri-nucleotides (1 236, 46.40%), tetra-nucleotides (521, 19.56%), pentanucleotides (40, 1.50%), and hexa-nucleotides (3, 0.11%) (Fig. 1). Apparently, the repeat units (≤ tetranucleotides) were the main body, and tri-nucleotides were top, followed by di-nucleotides and tetranucleotides. Oppositely, only 1.61% was the pentaand hexa-nucleotides. Additionally, AT/AT (19.14%) was the most common in di-nucleotides, then AC/GT (11.26%), and the least CG/CG (0.04%). Moreover, among all the repeat units, AAC/GTT was dominant for taking 19.82%. All the frequency of classified repeat types was listed in the order in Appendix 1. The repeat numbers of different types of SSRs also present some interesting differences, for example, the repeat times of di-nucleotides were mainly 6–11 (Table 2), but penta-nucleotides were only 4–6. Obviously, among the number 6–10, the SSRs percentages progressively decreased with the increasing base number of the repeat unit (e.g. di-, tri-, tetra-, penta-, hexa-) at the same of repeat times. Meanwhile, every type of SSR possessing a minimum of repeat times frequently had the highest proportion at the same repeat unit, such as (di-)6, 495; (tri-)5, 859; (tetra-)4, 363.

Fig.1 The distribution of different repeat units
Table 1 The information of microsatellite search
Table 2 Classified statistics to the number of different repeat units
Appendix 1 Frequency of classified repeat types (considering sequence complementary and descending order)
3.2 Isolation and analysis of polymorphic SSR loci

After the specific detection of 100 primers using the gradient PCR, 60 primers with target genes were first obtained and then tested for polymorphism of the wild 32 R. philippinarum genomic DNA. Of them, 34 polymorphic SSR loci had 112 alleles (average 3.29/ locus) and ranged from 2 to 7 (Na). In addition, three monomorphic SSR loci were also amplified (e.g. MG871457, MG871458, MG871459). All the genetic information was listed in Table 3, and the above 37 SSR loci had GenBank accession No. assigned (MG871423–MG871459). The Ho and He were 0.100 0–1.000 0 (mean=0.348 8) and 0.191 3–0.723 6 (mean=0.429 0), respectively. The PIC value was from 0.183 to 0.720, including four low polymorphic loci (PIC < 0.25), 14 moderate polymorphic loci (0.25 < PIC < 0.5), and 16 high polymorphic loci (PIC > 0.5). In other words, the polymorphism of 34 SSR loci was at a higher level. At the same time, apart from five SSR polymorphic loci (e.g. MG871423, MG871428, MG871429, MG871434, MG871435), all the other 29 polymorphic loci conformed to the Hardy-Weinberg equilibrium after the Bonferroni correction (adjusted P=0.001 471). In the meantime, no genotypic linkage disequilibrium or genotyping error (null allele) was observed.

Table 3 Basic genetic information of 37 microsatellite primers in R. philippinarum (32 individuals)

As shown in this study, the partition ratio of SSRs (1 SSR locus / ~40 sequences) was far below those of Sepiella japonica (1 SSR locus / 2 sequences) (Lü et al., 2017) and Atrina pectinata (1 SSR locus / 5 sequences) (Sun et al., 2017), which might be resulted from the ignorance of the mononucleotide repeats in the study. However, the percentages for the other SSRs types were similar to the corresponding ones. In other words, the abundance of SSR loci distribution is not only concerned with SSR search criteria but also affected by the database size and SSR loci development approaches (Varshney et al., 2005b; Parchman et al., 2010). In this research, tri-nucleotides (1 236, 46.40%) was the dominant part of all SSR repeat categories, followed by di-nucleotides (864, 32.43%). The results are consistent with the reports that trinucleotides primarily occur in exons, while di-, tetra-, and penta-nucleotide repeats occur mainly within the untranslated regions (UTRs) (Qiu et al., 2010). And, the database in the study is also mRNA transcriptome with the main exons. It may be that the cytosine (C) is methylated easily and turned into thymine (T) via deamination (Schorderet and Gartler, 1992) so that the AT part was top in di-nucleotides. Sun et al. (2017) reported that AC/GT (29.35%) is the highest, followed by AT/AT (25.41%). Therefore, different animals employ different SSRs patterns of mutability in genomic evolution (Chistiakov et al., 2006). Slightly differences were found in the genomes of Lottia gigantean and C. gigas, in which di-nucleotides or hexa-nucleotides present the majorities (Jiang et al., 2014). Altogether, we can conclude that the number of SSRs, representing genetic diversity and variability, may be in a downtrend with the increasing repeat unit or times of repetition.

Of the 34 polymorphic SSR loci, the average alleles (3.29/locus) were lower than that of R. philippinarumof 5.97 (Zhu et al., 2015), or 7.76 (Nie et al., 2014). Although in the same species, the different proportion of SSR types and repeat times may affect the polymorphism (Bouck and Vision, 2007). Zhu et al. (2015) reported that di-nucleotide repeats were the most abundant type (71.05%) of 38 EST-SSRs, but in the present 34 SSRs, tri-nucleotide repeats 19 (55.88%) were top, followed by di- 7 (23.53%) and tetra- 7 (23.53%). In the population diversity detection, Ho (mean=0.348 8) was lower than He (mean=0.429 0). We infer that as inbreeding has become more common, the homozygote is higher than heterozygote in the Xiamen's population. However, this view needs sufficient wild samples for further verification. Additionally, the PIC value of 30 (88.24%) SSR loci was at the medium-to-high level (PIC > 0.25), showing that the development efficiency of EST-derived SSR is optimistic and feasible and can avoid the triviality in traditional methods.


This study can prove that the method of SSRs derived from transcriptome is not only much more effective than the traditional ones but also able to create a general and comprehensive landscape for R. philippinarum SSRs. Perhaps, those SSRs that usually from a traditional approach represent some aspects only owing to the differences in probe type or the efficiency of SSRs enrichment, and so on. In addition, the 37 SSR loci, particularly 34 polymorphic loci, will contribute to the genetic and phylogenetic analysis of R. philippinarum, even to the shellfish and mollusks, for example, cross-species amplification, genetic breeding, and population conservation.


The datasets and materials supporting the conclusions of this article are included in the article.

Bai Z Y, Zheng H F, Lin J Y, Wang G L, Li J L. 2013. Comparative analysis of the transcriptome in tissues secreting purple and white nacre in the pearl mussel Hyriopsis cumingii. PLoS One, 8(1): e53617. DOI:10.1371/journal.pone.0053617
Bouck A, Vision T. 2007. The molecular ecologist's guide to expressed sequence tags. Molecular Ecology, 16(5): 907-924.
Chistiakov D A, Hellemans B, Volckaert F A M. 2006. Microsatellites and their genomic distribution, evolution, function and applications:a review with special reference to fish genetics. Aquaculture, 255(1-4): 1-29. DOI:10.1016/j.aquaculture.2005.11.031
Cong M, Wu H F, Cao T F, Lü J S, Wang Q, Ji C L, Li C H, Zhao J M. 2018. Digital gene expression analysis in the gills of Ruditapes philippinarum, exposed to short-and long-term exposures of ammonia nitrogen. Aquatic Toxicology, 194: 121-131. DOI:10.1016/j.aquatox.2017.11.012
Ellis J R, Burke J M. 2007. EST-SSRs as a resource for population genetic analyses. Heredity, 99(2): 125-132. DOI:10.1038/sj.hdy.6801001
FAO (Food and Agriculture Organization). 2014. Fishery and Aquaculture Statistics 2010. Food and Agriculture Organization of the United Nations, Rome.
Grabherr M G, Haas B J, Yassour M, Levin J Z, Thompson D A, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q D, Chen Z H, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren B W, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29(7): 644-652. DOI:10.1038/nbt.1883
Hégaret H, da Silva P M, Wikfors G H, Lambert C, De Bettignies T, Shumway S E, Soudant P. 2007. Hemocyte responses of Manila clams, Ruditapes philippinarum, with varying parasite, Perkinsus olseni, severity to toxicalgal exposures. Aquatic Toxicology, 84(4): 469-479. DOI:10.1016/j.aquatox.2007.07.007
Jiang Q, Li Q, Yu H, Kong L F. 2014. Genome-wide analysis of simple sequence repeats in marine animals-a comparative approach. Marine Biotechnology, 16(5): 604-619. DOI:10.1007/s10126-014-9580-1
Kong L F, Bai J, Li Q. 2014. Comparative assessment of genomic SSR, EST-SSR and EST-SNP markers for evaluation of the genetic diversity of wild and cultured Pacific oyster, Crassostrea gigas Thunberg. Aquaculture, 420-421: S85-S91. DOI:10.1016/j.aquaculture.2013.05.037
Li J T, Li J, Chen P, Liu P, He Y Y. 2015. Transcriptome analysis of eyestalk and hemocytes in the ridgetail white prawn Exopalaemon carinicauda:assembly, annotation and marker discovery. Molecular Biology Reports, 42(1): 135-147. DOI:10.1007/s11033-014-3749-6
Lü Z M, Hou L, Gong L, Liu L Q, Chen Y J, Guo B Y, Dong Y H, Wu C W. 2017. Isolation and analysis on EST microsatellites of Sepiella japonicaby de novo highthroughput transcriptome sequencing. Oceanologia et Limnologia Sinica, 48(4): 877-883. (in Chinese with English abstract)
Mandal S, Jena J K, Singh R K, Mohindra V, Lakra W S, Deshmukhe G, Pathak A, Lal K K. 2016. De novo development and characterization of polymorphic microsatellite markers in a schilbid catfish, Silonia silondia (Hamilton, 1822) and their validation for population genetic studies. Molecular Biology Reports, 43(2): 91-98. DOI:10.1007/s11033-016-3941-y
Marshall T C, Slate J, Kruuk L E, Pemberton J M. 1998. Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology, 7(5): 639-655. DOI:10.1046/j.1365-294x.1998.00374.x
Nie H T, Niu H B, Zhao L Q, Yang F, Yan X W, Zhang G F. 2015. Genetic diversity and structure of Manila clam (Ruditapes philippinarum) populations from Liaodong peninsula revealed by SSR markers. Biochemical Systematics and Ecology, 59: 116-125. DOI:10.1016/j.bse.2014.12.029
Nie H T, Zhu D P, Yang F, Zhao L Q, Yan X W. 2014. Development and characterization of EST-derived microsatellite makers for Manila clam (Ruditapes philippinarum). Conservation Genetics Resources, 6(1): 25-27. DOI:10.1007/s12686-013-0043-1
Parchman T L, Geist K S, Grahnen J A, Benkman C W, Buerkle C A. 2010. Transcriptome sequencing in an ecologically important tree species:assembly, annotation, and marker discovery. BMC Genomics, 11: 180. DOI:10.1186/1471-2164-11-180
Qiu L J, Yang C, Tian B, Yang J B, Liu A Z. 2010. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.). BMC Plant Biology, 10: 278. DOI:10.1186/1471-2229-10-278
Schorderet D F, Gartler S M. 1992. Analysis of CpG suppression in methylated and nonmethylated species. Proceedings of the National Academy of Sciences of the United States of America, 89(3): 957-961. DOI:10.1073/pnas.89.3.957
Sun X J, Li D M, Liu Z H, Zhou L Q, Wu B, Yang A G. 2017. De novo assembly of pen shell (Atrina pectinata) transcriptome and screening of its genic microsatellites. Journal of Ocean University of China, 16(5): 882-888. DOI:10.1007/s11802-017-3274-z
van Oosterhout C, Hutchinson W F, Wills D P M, Shipley P. 2004. MICRO-CHECKER:software for identifying and correcting genotyping errors in microsatellite data. Molecular Ecology Notes, 4(3): 535-538. DOI:10.1111/men.2004.4.issue-3
Varshney R K, Graner A, Sorrells M E. 2005b. Genic microsatellite markers in plants:features and applications. Trends in Biotechnology, 23(1): 48-55. DOI:10.1016/j.tibtech.2004.11.005
Varshney R K, Sigmund R, B rner A, Korzun V, Stein N, Sorrells M E, Langridged P, Granera A. 2005a. Interspecific transferability and comparative mapping of barley EST-SSR markers in wheat, rye and rice. Plant Science, 168(1): 195-202. DOI:10.1016/j.plantsci.2004.08.001
Wen Y F, Uchiyama K, Han W J, Ueno S, Xie W D, Xu G B, Tsumura Y. 2013. Null alleles in microsatellite markers. Biodiversity Science, 21(1): 117-126. (in Chinese with English abstract) DOI:10.3724/SP.J.1003.2013.10133
Wu H F, Liu X L, Zhao J M, Yu J B. 2011. NMR-based metabolomic investigations on the differential responses in adductor muscles from two pedigrees of Manila clam Ruditapes philippinarum to cadmium and zinc. Marine Drugs, 9(9): 1 566-1 579. DOI:10.3390/md9091566
Yamini K N, Ramesh K, Naresh V, Rajendrakumar P, Anjani K, Kumar V D. 2013. Development of EST SSR markers and their utility in revealing cryptic diversity in safflower (Carthamus tinctorius L.). Journal of Plant Biochemistry and Biotechnology, 22(1): 90-102. DOI:10.1007/s13562-012-0115-4
Yan L L, Qin Y J, Yan X W, Wang L N, Bi C L, Zhang J Y. 2015. Development of microsatellite markers in Ruditapes philippinarum using next-generation sequencing. Acta Ecologica Sinica, 35(5): 1 573-1 580. (in Chinese with English abstract)
Yasuda N, Nagai S, Yamaguchi S, Lian C L, Hamaguchi M. 2007. Development of microsatellite markers for the Manila clam Ruditapes philippinarum. Molecular Ecology Notes, 7(1): 43-45.
Yeh F C, Yang R, Boyle T J, Ye Z, Xiyan J M. 2000. PopGene 32, Microsoft Window-based freeware for population Genetic Analysis. Version 1.32. Molecular Biology and Biotechnology Centre, University of Alberta, Edmonton, Canada.
Zhu D P, Nie H T, Qin Y J, Li J, Liu L H, Yan X W. 2015. Development and characterization of 38 microsatellite makers for Manila clam (Ruditapes philippinarum). Conservation Genetics Resources, 7(2): 517-520. DOI:10.1007/s12686-014-0410-6