Transcription factors (TFs), found in all living organisms, are proteins involved in the control and regulation of gene expression. The function of these regulatory proteins is to activate or inhibit transcription of DNA by binding to specific DNA sequences. TFs are characterized by their DNA-binding domains (DBDs), among them, the helix-turn-helix (HTH) domain is the most prevalent in prokaryotic genomes.

The large number of TF protein sequences available demands user-friendly databases to facilitate inter-genomic and intra-genomic analyses. We have therefore developed a novel resource, the P2TF (Predicted Prokaryotic Transcription Factors) database, that contains the TFs of all available bacterial and archaeal genomes, and 43 metagenomes. Our objective was to provide an easy to use environment for validation by experts, according to their fields/organisms of interest, with the data being completely available and consultable by all of the scientific community.

From the P2TF website the user can:

Browse genome and metagenome TF predictions and manually curated proteins

Search for a sequence id or domain family

The P2TF homepage contains a navigation bar that allows database browsing. Among the menus, users will also find P2TF Browse, which links directly to sortable lists of analysed genomes, plasmids and metagenomes.

 

 

 

 

The P2TF database contains the TFs of all available bacterial and archaeal genomes, and 43 metagenomes.

The P2TF homepage contains a navigation bar that allows database browsing. Among the menus, users will also find Browse, which links directly to sortable lists of analysed genomes, plasmids and metagenomes.

 

 

 

The selection of a microbe or microbiome displays the result of the P2TF analysis process.

 

 

 

 

The page shows global counts of the different categories of TFs and detailed class counts of each category. Each class result provides a clickable link to a detailed gene list.

It shows also a search module, based on:

- locus-tag

- gene name

- gi number

- domain

 

 

 

Selecting an object from the list identifiers displays a detailed gene description page with an image representing the gene in the appropriate frame. Red vertical lines represent stop codons and green lines represent potential start codons.

Blast searches can be performed for the sequence, using external links to numerous databases.

 

 

 

The gene description page contains a link to a cartographic gene context (Chromosome Region View), with several options such as zooming in or out, moving along the chromosome, displaying genes in upstream or downstream regions and drawing genes.

 

 

 

A second menu, P2TF Search, provides several search modes that allows users to request genes of all the database, on the basis of their locus-tag, gene name, gi number or domain possession. A supplemental search mode allows users to accede specifically to a genome of interest or a group of genomes, using a taxonomy tree-browser.

 

 

 

The search module builds search output as a gene list that is linked to a full description for each selected gene (see above).

P2TF was designed to allow download of TF data in tab-delimited format and generates a file compatible with spreadsheet programs such as Excel. Users can also download for each genome and metagenome a multi-Fasta file (nucleotide or protein sequences).

 

 

 

An example of an Excel file output.