TRANSFAC public data: selected portions of the public TRANSFAC 2005 data
- TRANSFAC
comes from
gene-regulation.com,
and contains selected data from the (circa 2005) public TRANSFAC
database, particularly tables for:
- each gene containing a binding region,
- each transcription factor, and
- each binding region (site) within a gene.
Note that a gene may have mulitple binding regions, a binding region may be bound by mulitple transcription factors, and a transcription factor may bind multiple binding sites.
In this deployment of the TRANSFAC data, there are table(s) for gene data, transcription factor data, and binding region data, and to express the mulitplicity of interconnections, the primary binding region table (TRANSFAC.REGIONS) links binding regions with transcription factors.
Linkages are made through the IDs assigned to each gene, transcription factor and binding region:
- Gene IDs take the form "Gxxxxxx", where "x" is a digit;
- Transcription factor IDs take the form "Txxxxx"; and
- Binding region IDs take the form "Rxxxxx".
Fields containing these IDs are usually either primary or foreign key fields in the tables in which they appear.
This data is also available from the gene-regulation.com web site via Web forms.
Within CLSD this TRANSFAC data can, of course, be accessed via SQL commands that can merge it with other data within CLSD. However, there is currently no TRANSFAC table that maps all TRANSFAC IDs to other ID systems such as Entrez Gene IDs (GIDs), or Uniprot accession IDs. The table TRANSFAC.FACTOR_DR does contain SOME such info, but it is not complete.
As a result, such mapping must be done using name variables like GENE_NAME, etc., or by using a table in the TRRD schema within CLSD, called TRRD.GENES_DR. GENES_DR includes (at least some) TRANSFAC factor IDs with their mappings to other ID systems. See the CLSD TRRD page for more information.
However, as of March 2009, we are planning to create a mapping table using TRRD resources which will be placed in a TRRD schema.
TRANSFAC Gene data
There are 3 tables for TRANSFAC gene data. The main table is TRANSFAC.GENES with the following layout:
| Field name | Type |
|---|---|
| TRANSFAC_GENE_ID | VARCHAR |
| GENE_NAME | VARCHAR |
| SHORT_DESCRIPTION | VARCHAR |
| SYNONYMS | VARCHAR |
| SPECIES | VARCHAR |
| CLASSIFICATION | VARCHAR |
These fields were selected from the gene records provided through the Web site. The two additional tables are TRANSFAC.DR and TRANSFAC.BS, which contain entries for the "DR" and "BS" records in each Web page associated with a specific gene. Each entry in the GENES table is indexted on its GENE_ID field, and the other two tables have foreign key entries pointing to the GENE_ID index in the GENES table.
TRANSFAC Factor data
There are 7 tables for the TRANSFAC factor data. The main table is TRANSFAC.FACTORS with the following layout:
| Field name | Type |
|---|---|
| TRANSFAC_FACTOR_ACC | VARCHAR |
| FACTOR_ID | VARCHAR |
| FACTOR_NAME | VARCHAR |
| SYNONYMS | VARCHAR |
| SPECIES | VARCHAR |
| BIOLOGICAL_CLASSIFICATION | VARCHAR |
| ENCODING_GENE | VARCHAR |
| HOMOLOGS | VARCHAR |
| CLASSIFICATION | VARCHAR |
| SIZE | VARCHAR |
| CELL_SPECIFICITY_POSITIVE | VARCHAR |
| CELL_SPECIFICITY_NEGATIVE | VARCHAR |
The other table are:
- TRANSFAC.FACTOR_BS
- TRANSFAC.FACTOR_DR
- TRANSFAC.FACTOR_FF
- TRANSFAC.FACTOR_FT
- TRANSFAC.FACTOR_IN
- TRANSFAC.FACTOR_SF
each of which contains one-to-many data that are present within a single factor's Web page. These tables all have foreign keys into the TRANSFAC.FACTORS table.
TRANSFAC Binding Region data
The binding region data is presented in a single table, TRANSFAC.REGIONS, with this schema:
| REGION_ACC | VARCHAR |
| REGION_ID | VARCHAR |
| REGION_GENE_ID | VARCHAR |
| REGION_GENE_NAME | VARCHAR |
| REGION_FUNCTION | VARCHAR |
| FACTOR_ID | VARCHAR |
| FACTOR_NAME | VARCHAR |
| SPECIES_NAME_SHORT | VARCHAR |
| SPECIES_NAME_FULL | VARCHAR |
Note that this table shows which transcription factors bind with which genes at which sites.
It can be used to get more details about the components by joining this table with other gene or transcription tables. For example one may use the following query to join this table with the main gene table and display the results for the 1347 transcription factor-binding region pairs in which both the factor and the binding region are annotated as "human":
select * from transfac.regions a join transfac.genes b on a.region_gene_id = b.transfac_gene_id where a.species_name_short like '%human%' and b.species like '%human%'




