API Tutorial

This document aims to elucidate how ChemFH’s API service is structured, with a variety of usage cases as illustrations, to help new users learn how the service works and how to construct the URLs that are the interface to this service.

USAGE POLICY: Please note that ChemFH API is not designed for processing very large volumes (millions) of requests at the same time. We suggest that any script or application refrain from exceeding 5 requests per second to prevent overloading the ChemFH servers. If you have a large data set that you need to compute with, please contact us for assistance in optimizing your task, as there are likely more efficient ways to approach such bulk queries.

1. Calculate the FH properties of molecule(s)

POST http://121.40.210.46:8110/api/fh

Description:

Calculate the FH properties of molecule(s), including model-based and rule-based.

The values in the '_index' column of the result are shown as a two-dimensional array, where each element represents the index of the matching atom highlighted in the molecule.

Query Parameters & Example:

Key Value Type Description
SMILES ["CC(=O)Oc1ccccc1C(O)=O", "CC(C)OC(=O)CC(=O)CSc1nc2c(cc1C#N)CCC2"] string or [string] Molecular SMILES string, which is commonly provided by most chemical toolkits.

Result & Example:

Return all computable properties of the molecule.

Content-Type: application/json,application/xml


{
    "status": "success",
    "code": 200,
    "data": {
        // Identifier id for this task
        "taskid": "tmpihx1sr021706355868",
        "number of all molecules": 2,
        "number of valid molecules": 2,
        "data": [
            {
                "smiles": "CC(C)OC(=O)CC(=O)CSc1nc2c(cc1C#N)CCC2",
                // The following seven lines represent the model predictions for the seven mechanisms
                "Other assay interference": 0.993,
                "Blue fluorescence": 1.0,
                "FLuc inhibitors": 0.891,
                "Promiscuous compounds": 0.168,
                "Colloidal aggregators": 0.557,
                "Reactive compounds": 0.0,
                "Green fluorescence": 0.852,
                // The following seven rows indicate the uncertainty of the model prediction results
                // for the seven mechanisms
                "Other assay interference uncertainty": 1.5e-05,
                "Blue fluorescence uncertainty": 0.0,
                "FLuc inhibitors uncertainty": 0.006625,
                "Promiscuous compounds uncertainty": 0.039083,
                "Colloidal aggregators uncertainty": 0.034897,
                "Reactive compounds uncertainty": 0.0,
                "Green fluorescence uncertainty": 0.021514,
                // The following 17 lines represent the matching results of the 17 rules
                "Aggregators_index": [],
                "Fluc_index": [],
                "Blue_fluorescence_index": [],
                "Green_fluorescence_index": [],
                "Reactive_index": [],
                "Other_assay_interference_index": [],
                "Promiscuous_index": [],
                // Each element of the list represents the atomic highlighting number of a matching substructure
                "ALARM_NMR_index": ...,
                "BMS_index": [],
                "Chelator_Rule_index": [],
                "GST_FHs_Rule_index": [],
                "His_FHs_Rule_index": [],
                "Luciferase_Inhibitor_Rule_index": [],
                "NTD_index": ...,
                "PAINS_index": [],
                "Potential_Electrophilic_Rule_index": [],
                "Lilly_index": []
            }, {
                // ... Detailed information on the second molecule
            }
        ],
        "explanation": {
            "Aggregators": "Category 0: Non-aggregators; Category 1: aggregators. The output value is the probability of being aggregators, within the range of 0 to 1.",
            "FLuc inhibitors": "Category 0: Non-promiscuous; Category 1: promiscuous. The output value is the probability of being promiscuous, within the range of 0 to 1.",
            "Blue/Green fluorescence": "Category 0: Non-fluorescence; Category 1: fluorescence. The output value is the probability of being fluorescence, within the range of 0 to 1.",
            "Reactive compounds": "Category 0: Non-reactive; Category 1: reactive. The output value is the probability of being reactive compounds, within the range of 0 to 1.",
            "Other assay interference": "Category 0: Non-assay interferences; Category 1: assay interferences. The output value is the probability of being assay interferences, within the range of 0 to 1.",
            "Promiscuous compounds": "Category 0: Non-promiscuous; Category 1: promiscuous. The output value is the probability of being promiscuous, within the range of 0 to 1.",
            "XX uncertainty": "Uncertainty for each property",
            "XX_index": "Atomic index of rule matches for each property"
        }
    }
}
    

2. Quick Examples

Example:


import requests

baseUrl = 'http://121.40.210.46:8110'

if __name__ == '__main__':
    api = '/api/fh'
    url = baseUrl + api
    param = {
        'SMILES': ["CC(=O)Oc1ccccc1C(O)=O", "CC(C)OC(=O)CC(=O)CSc1nc2c(cc1C#N)CCC2"],
    }
    response = requests.post(url, json=param)
    if response.status_code == 200:
        data = response.json()['data']
        print(data)

                    

For users who need to compute a large number of molecules, we recommend splitting the list of molecules into multiple subtasks for iterative computation (1000 molecules). Submit the task and get the results in the form of code, and then stitch the results of each subtask into the final result. We still recommend using the Python language as he is quite handy.


import json
import requests
import pandas as pd

baseUrl = 'http://121.40.210.46:8110'


def transform(data):
    resultList = []
    for mol in data['data']:
        if not mol['data']:
            # Invalid SMILES
            tmp = {'smiles': mol['smiles']}
        else:
            tmp = dict({'smiles': mol['smiles']})
            for _, admet in mol['data'].items():
                for endpoint in admet:
                    # endpoint is a dict
                    tmp[endpoint['name']] = endpoint['value']
        resultList.append(tmp)
    return pd.DataFrame(resultList).fillna('Invalid SMILES')


def divide_list(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]


if __name__ == '__main__':
    api = '/api/admet'
    url = baseUrl + api
    param = {
        'SMILES': []
    }
    n = 2000
    smiles_list = ['CN1C2CCC1CC(OC(=O)c1cccn1C)C2', 'O=C(O)Nc1scnc1C(=O)Nc1nccs1',
                   'COc1ccc(C=C(F)C(=O)c2cc(OC)c(OC)c(OC)c2)cc1'] * 2500

    for _, sublist in enumerate(divide_list(smiles_list, n)):
        param['SMILES'] = sublist

        response = requests.post(url, json=param)

        if response.status_code == 200:  # If access is successful
            data = response.json()['data']
            # transform to csv file
            result = transform(data)
            print(result)
            result.to_csv('result' + str(_) + '.csv', index=False)