Alu elements are the most active and predominant type of short interspersed elements (SINEs) in the human genome. Recently inserted polymorphic (for presence/absence) Alu elements contribute to genome diversity among different human populations, and they are useful genetic markers for population genetic studies. The objective of this study is to identify polymorphic Alu insertions through an in silico comparative genomics approach and to analyze their distribution pattern throughout the human genome. By computationally comparing the public and Celera sequence assemblies of the human genome, we identified a total of 800 polymorphic Alu elements. We used polymerase chain reaction-based assays to screen a randomly selected set of 16 of these 800 Alu insertion polymorphisms using a human diversity panel to demonstrate the efficiency of our approach. Based on sequence analysis of the 800 Alu polymorphisms, we report three new Alu subfamilies, Ya3, Ya4b, and Yb11, with Yb11 being the smallest known Alu subfamily. Analysis of retrotransposition activity revealed Yb11, Ya8, Ya5, Yb9, and Yb8 as the most active Alu subfamilies and the maintenance of a very low level of retrotransposition activity or recent gene conversion events involving S subfamilies. The 800 polymorphic Alu insertions are characterized by the presence of target site duplications (TSDs) and longer than average polyA-tail length. Their pre-integration sites largely follow an extended "NT-AARA" motif. Among chromosomes, the density of Alu insertion polymorphisms is positively correlated with the Alu-site availability and is inversely correlated with the densities of older Alu elements and genes.
- Comparative genomics