Overview
One of the major challenges in early drug discovery is the detection of frequent false-positive results (Fig.1), which have seriously interfered with hit identification, thus leading to a waste of time and resource. To avoid the attritions caused by false positive results, ChemFH (Chemical Frequent Hitter), an integrated online platform is developed for the screening and prediction of potential frequent hitters, covering colloidal aggregate, firefly luciferase reporter enzyme inhibition, fluorescence, chemical reactivity and promiscuity. Based on the collection of a large and high-quality database and the application of Graph Neural Networks architectures (Fig.2), ChemFH is able to accomplish reliable detection capability of frequent hitters, thus improving the efficiency of drug R&D.

Figure 1. The distributions of true positive result and different mechanism of false positive results in HTS assay

Figure 2. The main modules and functions of ChemFH
Introduction
The Evaluation Mode allows the input of single molecule which aims to confirm the authenticity of biological campaign screening result. Based on the combination of credible prediction models and useful substructure rules, it is believed that through Evaluation Mode, it is able to comprehensively evaluate the risk and potential of the queried molecule.
The Screening Mode is able to assist hit identification by screening molecular dataset prior to detect potential frequent hitters. Such function makes it possible to pre-screen molecules with undesirable interfering features, thus increasing the efficiency of drug design, improving the credibility of experimental result and decreasing unnecessary cost.
Category |
Data |
Model (AUC) |
||||
Positive |
Negative |
Total |
GCN |
GAT |
AttentiveFP |
|
Agg |
16783 |
35114 |
51897 |
0.937 |
0.937 |
0.937 |
Blue FLuo |
4871 |
38659 |
43530 |
0.936 |
0.938 |
0.926 |
GreenFLuo |
8319 |
31352 |
39671 |
0.728 |
0.723 |
0.720 |
Other assay interference |
1544 |
2159 |
3703 |
0.779 |
0.777 |
0.743 |
FLuc inhibitor |
12703 |
121345 |
133418 |
0.969 |
0.967 |
0.910 |
Reactive compound |
820 |
162069 |
162889 |
0.992 |
0.993 |
0.909 |
Promiscuous compound |
7518 |
334171 |
341689 |
0.921 |
0.930 |
0.905 |
Category |
Number of substructures |
Flagged molecules |
Flagged FHs |
Accuracy |
Agg |
26 |
2651 |
2265 |
0.854 |
Blue FLuo |
26 |
1208 |
1172 |
0.970 |
GreenFLuo |
16 |
274 |
260 |
0.949 |
Other assay interference |
7 |
209 |
200 |
0.957 |
FLuc inhibitor |
18 |
1638 |
1448 |
0.884 |
Reactive compound |
3 |
30 |
23 |
0.767 |
Promiscuous compound |
6 |
733 |
704 |
0.960 |
Name |
Number of substructures |
Screening Scope |
Source |
PAINS |
480 |
Alpha-screen artifacts and reactive compound |
J Med Chem 2010;53:2719–40 |
BMS |
176 |
Undesirable, reactive compounds |
J Chem Inf Model 2006;46:1060–8 |
GST/GSH FH filter |
34 |
GST/GSH FHs |
J Biomol Screen 2016;21:596–607 |
His-tagged protein FH filter |
19 |
Ni2+ chelators |
J Biomol Screen 2014;19:715–26 |
ALARM NMR |
75 |
Thiol reactive compounds |
J Am Chem Soc 2005;127:217–24 |
Luciferase inhibitor Rule |
3 |
FLuc inhibitors |
J Chem Inf Model 2018;58:933–42 |
Chelator Rule |
55 |
Chelators |
ChemMedChem 2010;5:195–9 |
NTD |
105 |
Unwanted groups, reactive groups and possible HTS interferences |
ChemMedChem 2008;3:435–44 |
Potential electrophilic Rule |
119 |
Reactive compounds |
J Chem Inf Model 2012;52:2310–6 |
Development Environment
Library |
Version |
RDkit |
2019.03.1 |
Django |
2.2 |
DGL |
0.5.2 |
DGL-LifeSci |
0.2.5 |
Pytorch |
1.6.0 |
Torchvision |
0.7.0 |
pyecharts |
1.8.1 |
Evaluation Mode
The Evaluation Mode allows the input of single molecule which aims to confirm the authenticity of biological campaign screening result. Two input types are provided:
- 1. By inputting single SMILES string;
- 2. By drawing molecule from editor below.
Step1: Access the Services page via Services-> Evaluation Mode in the navigation bar.

Step2: Select first entry for: Input a single strings and Enter the SMILES string in the SMILES input box.


Step3: Select related evaluation methods for following application. Two types of evaluation methods are provided: specific mechanism FH detection and collected credible FH screening rules application. The prior type includes both prediction models and substructure rules, which are constructed based on the special dataset collection.


Mechanism:
- 1.Colloidal aggregators: Compound aggregation tends to start when the concentration is above the CAC and end as aggregators form with radius of approximately 30-600 nm. The resulting colloidal aggregators would non-specifically bind to the surface of proteins, thus inducing local protein unfolding, which usually results in destabilization or denaturation of enzymes
- 2.FLuc inhibitors: Due to its unique catalysis mechanism, FLuc is widely used in a variety of HTS bioluminescence assays, especially in the assay which aims to study gene expression at the transcriptional level. However, the inhibition of Fluc by unexpected FLuc inhibitors would produce interference to HTS assays.
- 3.Blue/Green fluorescence: Fluorescence is the process by which a molecule, called fluorophore or fluorescent dye, absorbs a photon of light, exciting an electron to a higher energy state. Fluorophores have many applications, including as enzyme substrates, labels for biomolecules, cellular stains and environmental indicators. However, the appearance of fluorescent compound would produce interference to related HTS assays.
- 4.Reactive compounds: Chemical reactive compounds typically result in the chemical modification of reactive protein residues or, less frequently, the modification of nucleophilic assay reagents.
- 5.Promiscuous compounds: Promiscuous compounds refer to compounds that specifically bind to different macromolecular targets. These multiple interactions may include unintended targets, thus triggering adverse reactions and other safety issues.
- 6.Other assay interferences: Alpha-screen, FRET, TR-FRET, absorbance artifacts are included.
Other FH Screening Rules:
- 1.PAINS: frequent hitters, Alpha-screen artifacts and reactive compound; 480 substructures (J Med Chem 2010;53:2719–40)
- 2.BMS: undesirable, reactive compounds; 176 substructures (J Chem Inf Model 2006;46:1060–8)
- 3.GST/GSH FH filter: GST/GSH FHs; 34 substructures ( J Biomol Screen 2016;21:596–607)
- 4.His-tagged protein FH filter: Ni2+ chelators; 19 substructures (J Biomol Screen 2014;19:715–26)
- 5.ALARM NMR: thiol reactive compounds; 75 substructures (J Am Chem Soc 2005;127:217–24)
- 6.Luciferase inhibitor Rule: FLuc inhibitors; 3 substructures (J Chem Inf Model 2018;58:933–42)
- 7.Chelator Rule: chelators; 55 substructures (ChemMedChem 2010;5:195–9)
- 8.NTD: unwanted groups, reactive groups and possible HTS interferences; 105 substructures (ChemMedChem 2008;3:435–44)
- 9.Potential electrophilic Rule: reactive compounds; 119 substructures (J Chem Inf Model 2012;52:2310–6)
Step4: Submit and get results.

After submission, the backend will input the queried molecule into the selected models and (or) substructure rules. This page will show a brief overview of the results obtained, including:
1. Visualization block:
- 1) The status of the queried molecule;
- 2) The 2D, 3D structure graph of the queried molecule;
- 3) The radar chart is provided for the better comprehension of the different mechanism prediction result, where Blue Label indicates that this type of mechanism is chosen for prediction, while Black Label indicates that this this type of mechanism is not chosen for prediction.

2.Model Predictions and Substructure Screening blocks:
- 1) Model Predictions block provide the information about the prediction mechanism and corresponding GCN, GAT, AttentiveFP and average prediction scores;
- 2) Substructure Screening block provide the information about the prediction mechanism and corresponding flagged substructure name and graph.

3.File Download:
The csv (pdf) file shows the score the accepted status and fragment information for each mechanism for each molecule (in this case, only one molecule), including SMILES, status, GCN score, GAT score, AttentiveFP score, average score, and SMARTS for each rule.

Step1: Access the Services page via Services-> Evaluation Mode in the navigation bar.

Step2: Select second entry for: Drawing molecules through the molecule editor and draw a molecule.

Step3: Select related evaluation methods for following application. Two types of evaluation methods are provided: specific mechanism FH detection and collected credible FH screening rules application. The prior type includes both prediction models and substructure rules, which are constructed based on the special dataset collection.


Mechanism:
- 1.Colloidal aggregators: Compound aggregation tends to start when the concentration is above the CAC and end as aggregators form with radius of approximately 30-600 nm. The resulting colloidal aggregators would non-specifically bind to the surface of proteins, thus inducing local protein unfolding, which usually results in destabilization or denaturation of enzymes
- 2.FLuc inhibitors: Due to its unique catalysis mechanism, FLuc is widely used in a variety of HTS bioluminescence assays, especially in the assay which aims to study gene expression at the transcriptional level. However, the inhibition of Fluc by unexpected FLuc inhibitors would produce interference to HTS assays.
- 3.Blue/Green fluorescence: Fluorescence is the process by which a molecule, called fluorophore or fluorescent dye, absorbs a photon of light, exciting an electron to a higher energy state. Fluorophores have many applications, including as enzyme substrates, labels for biomolecules, cellular stains and environmental indicators. However, the appearance of fluorescent compound would produce interference to related HTS assays.
- 4.Reactive compounds: Chemical reactive compounds typically result in the chemical modification of reactive protein residues or, less frequently, the modification of nucleophilic assay reagents.
- 5.Promiscuous compounds: Promiscuous compounds refer to compounds that specifically bind to different macromolecular targets. These multiple interactions may include unintended targets, thus triggering adverse reactions and other safety issues.
- 6.Other assay interferences: Alpha-screen, FRET, TR-FRET, absorbance artifacts are included.
Other FH Screening Rules:
- 1.PAINS: frequent hitters, Alpha-screen artifacts and reactive compound; 480 substructures (J Med Chem 2010;53:2719–40)
- 2.BMS: undesirable, reactive compounds; 176 substructures (J Chem Inf Model 2006;46:1060–8)
- 3.GST/GSH FH filter: GST/GSH FHs; 34 substructures ( J Biomol Screen 2016;21:596–607)
- 4.His-tagged protein FH filter: Ni2+ chelators; 19 substructures (J Biomol Screen 2014;19:715–26)
- 5.ALARM NMR: thiol reactive compounds; 75 substructures (J Am Chem Soc 2005;127:217–24)
- 6.Luciferase inhibitor Rule: FLuc inhibitors; 3 substructures (J Chem Inf Model 2018;58:933–42)
- 7.Chelator Rule: chelators; 55 substructures (ChemMedChem 2010;5:195–9)
- 8.NTD: unwanted groups, reactive groups and possible HTS interferences; 105 substructures (ChemMedChem 2008;3:435–44)
- 9.Potential electrophilic Rule: reactive compounds; 119 substructures (J Chem Inf Model 2012;52:2310–6)
Step4: Submit and get results.

After submission, the backend will input the queried molecule into the selected models and (or) substructure rules. This page will show a brief overview of the results obtained, including:
1. Visualization block:
- 1) The status of the queried molecule;
- 2) The 2D, 3D structure graph of the queried molecule;
- 3) The radar chart is provided for the better comprehension of the different mechanism prediction result, where Blue Label indicates that this type of mechanism is chosen for prediction, while Black Label indicates that this this type of mechanism is not chosen for prediction.

2.Model Predictions and Substructure Screening blocks:
- 1) Model Predictions block provide the information about the prediction mechanism and corresponding GCN, GAT, AttentiveFP and average prediction scores;
- 2) Substructure Screening block provide the information about the prediction mechanism and corresponding flagged substructure name and graph.

3.File Download:
The csv (pdf) file shows the score the accepted status and fragment information for each mechanism for each molecule (in this case, only one molecule), including SMILES, status, GCN score, GAT score, AttentiveFP score, average score, and SMARTS for each rule.

Screening Mode
The Screening Mode is able to assist hit identification by screening molecular dataset prior to detect potential frequent hitters. User can upload file (.txt/.sdf) for following screening.
Step1: Access the Services page via Services-> Screening Mode in the navigation bar.

Step2: Select the entry for: Molecule File and select the upload file (.txt/.sdf) for following screening.
Step3: Select related evaluation methods for following application. Two types of evaluation methods are provided: specific mechanism FH detection and collected credible FH screening rules application. The prior type includes both prediction models and substructure rules, which are constructed based on the special dataset collection.

Mechanism:
- 1.Colloidal aggregators: Compound aggregation tends to start when the concentration is above the CAC and end as aggregators form with radius of approximately 30-600 nm. The resulting colloidal aggregators would non-specifically bind to the surface of proteins, thus inducing local protein unfolding, which usually results in destabilization or denaturation of enzymes
- 2.FLuc inhibitors: Due to its unique catalysis mechanism, FLuc is widely used in a variety of HTS bioluminescence assays, especially in the assay which aims to study gene expression at the transcriptional level. However, the inhibition of Fluc by unexpected FLuc inhibitors would produce interference to HTS assays.
- 3.Blue/Green fluorescence: Fluorescence is the process by which a molecule, called fluorophore or fluorescent dye, absorbs a photon of light, exciting an electron to a higher energy state. Fluorophores have many applications, including as enzyme substrates, labels for biomolecules, cellular stains and environmental indicators. However, the appearance of fluorescent compound would produce interference to related HTS assays.
- 4.Reactive compounds: Chemical reactive compounds typically result in the chemical modification of reactive protein residues or, less frequently, the modification of nucleophilic assay reagents.
- 5.Promiscuous compounds: Promiscuous compounds refer to compounds that specifically bind to different macromolecular targets. These multiple interactions may include unintended targets, thus triggering adverse reactions and other safety issues.
- 6.Other assay interferences: Alpha-screen, FRET, TR-FRET, absorbance artifacts are included.
Other FH Screening Rules:
- 1.PAINS: frequent hitters, Alpha-screen artifacts and reactive compound; 480 substructures (J Med Chem 2010;53:2719–40)
- 2.BMS: undesirable, reactive compounds; 176 substructures (J Chem Inf Model 2006;46:1060–8)
- 3.GST/GSH FH filter: GST/GSH FHs; 34 substructures ( J Biomol Screen 2016;21:596–607)
- 4.His-tagged protein FH filter: Ni2+ chelators; 19 substructures (J Biomol Screen 2014;19:715–26)
- 5.ALARM NMR: thiol reactive compounds; 75 substructures (J Am Chem Soc 2005;127:217–24)
- 6.Luciferase inhibitor Rule: FLuc inhibitors; 3 substructures (J Chem Inf Model 2018;58:933–42)
- 7.Chelator Rule: chelators; 55 substructures (ChemMedChem 2010;5:195–9)
- 8.NTD: unwanted groups, reactive groups and possible HTS interferences; 105 substructures (ChemMedChem 2008;3:435–44)
- 9.Potential electrophilic Rule: reactive compounds; 119 substructures (J Chem Inf Model 2012;52:2310–6)
Step4: Submit and get results.

After submission, the backend will input the queried molecule into the selected models and (or) substructure rules. This page will show a brief overview of the results obtained, including:
1. Summary block:
- 1) Molecules entry indicates the total number of input SMILES strings;
- 2) Mechanism entry indicates the types of chosen screening mechanism;
- 3) Other rules entry indicates types of chosen screening structural rules;
- 4) Link entry provided the download service about the detailed screening result;
- 5) Pie chart is provided for the better comprehension of the prediction status of queried molecules.

2. Result block:
- 1) INDEX entry indicates the index of the input SMILES;
- 2) STATUS entry indicates the final screening result: Accepted (low FH risk), Intermediate (potential FH risk) and Rejected (high FH risk);
- 3) STRUCTURE entry is the molecular image;
- 4) SMILES entry is the input SMILES string;
- 5) Specific evaluation information for each molecule can be viewed by clicking on the VIEW button under the DETAIL entry.

3. File Download:
The csv (pdf) file shows the score the accepted status and fragment information for each mechanism for each molecule (in this case, only one molecule), including SMILES, status, GCN score, GAT score, AttentiveFP score, average score, and SMARTS for each rule.

4. Specific Result:
Click the VIEW button to see specific information about each molecule.
