TY - JOUR
T1 - Olfactory Receptor Database
T2 - A metadata-driven automated population from sources of gene and protein sequences
AU - Crasto, Chiquito
AU - Marenco, Luis
AU - Miller, Perry
AU - Shepherd, Gordon
PY - 2002/1/1
Y1 - 2002/1/1
N2 - The Olfactory Receptor Database (ORDB; http:// senselab.med.yale.edu/senselab/ordb) is a central repository of olfactory receptor (OR) and olfactory receptor-like gene and protein sequences. To deal with the very large OR gene family, we have constructed an algorithm that automatically down-loads sequences from web sources such as GenBank and SWISS-PROT into the database. The algorithm uses hypertext markup language (HTML) parsing techniques that extract information relevant to ORDB. The information is then correlated with the metadata in the ORDB knowledge base to encode the unstructured text extracted into the structured format compliant with the database architecture, entity attribute value with classes and relationship (EAV/CR), which supports the SenseLab project as a whole. Three population methods: batch, automatic and semi-automatic population are discussed. The data is imported into the database using extensible markup language (XML).
AB - The Olfactory Receptor Database (ORDB; http:// senselab.med.yale.edu/senselab/ordb) is a central repository of olfactory receptor (OR) and olfactory receptor-like gene and protein sequences. To deal with the very large OR gene family, we have constructed an algorithm that automatically down-loads sequences from web sources such as GenBank and SWISS-PROT into the database. The algorithm uses hypertext markup language (HTML) parsing techniques that extract information relevant to ORDB. The information is then correlated with the metadata in the ORDB knowledge base to encode the unstructured text extracted into the structured format compliant with the database architecture, entity attribute value with classes and relationship (EAV/CR), which supports the SenseLab project as a whole. Three population methods: batch, automatic and semi-automatic population are discussed. The data is imported into the database using extensible markup language (XML).
UR - http://www.scopus.com/inward/record.url?scp=0036083910&partnerID=8YFLogxK
U2 - 10.1093/nar/30.1.354
DO - 10.1093/nar/30.1.354
M3 - Article
C2 - 11752336
AN - SCOPUS:0036083910
SN - 0305-1048
VL - 30
SP - 354
EP - 360
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 1
ER -