Melatonin et al thread

Here is the actual schema and the melatonin carry-through. Human metabolism, naturally, decided the cleanest way to explain itself was with overlapping enzymatic routes, oxidative side-paths, and terminal conjugation.

Melatonin is metabolized mainly by 6-hydroxylation, but also by demethylation to NAS, deacetylation to 5-MT then to 5-methoxyindoleacetaldehyde, 5-MIAA, and 5-MTOL, plus oxidative pyrrole-ring cleavage to AFMK and AMK. Oxidative chemistry also yields 2-OHM, 4-OHM, and cyclic 3-OHM, while 6-OHM and NAS can be further sulfated or glucuronidated as terminal excretion products. (PMC)

For “phase separation propensity,” I treated the meaningful target as relative biomolecular-condensate partition propensity, not “does this tiny molecule LLPS by itself,” because that is usually the wrong question for small metabolites. The physically relevant drivers are hydrophobicity, aromaticity, charge, hydrogen bonding, cation-π, and π-π interactions. (PMC)

Machine-readable feature schema

FEATURE_SCHEMA = {
    "identity": [
        "name","smiles","inchi","inchikey","formula","exact_mw","heavy_atoms"
    ],
    "2d_physchem": [
        "logP","logD_pH74","TPSA","HBD","HBA","rot_bonds","aromatic_rings",
        "rings","frac_Csp3","formal_charge","molar_refractivity"
    ],
    "microstates": [
        "site_pKa_*","dominant_state_pH74","state_fraction_*","tautomer_id",
        "tautomer_fraction","state_weighted_logD","state_weighted_TPSA"
    ],
    "3d_ensemble": [
        "conf_id","conf_energy_kcal","boltz_weight","Rg_A","SASA_A2",
        "polar_SASA_A2","LabuteASA","PMI1","PMI2","PMI3","NPR1","NPR2",
        "asphericity","eccentricity"
    ],
    "electronic_QM": [
        "HOMO_eV","LUMO_eV","gap_eV","dipole_D","total_energy",
        "partial_charge_*","fukui_plus_*","fukui_minus_*","local_electrophilicity_*"
    ],
    "fingerprints_similarity": [
        "ECFP4","FCFP4","MACCS","atom_pair_fp","topological_torsion_fp",
        "tanimoto_vs_parent","dice_vs_parent","tversky_vs_parent"
    ],
    "reactivity_toxicology": [
        "phenol_count","catechol_flag","indole_flag","methoxy_flag","amide_flag",
        "aldehyde_flag","carboxylate_flag","sulfate_flag","glucuronide_flag",
        "kynuramine_flag","cyclized_pyrroloindole_flag",
        "ROS_scavenging_flag","quinone_or_ring_opening_liability",
        "covalent_adduct_risk","metal_chelation_flag"
    ],
    "biointeraction": [
        "MT1_score","MT2_score","AhR_score","PPARg_score","MAO_score",
        "COMT_score","ALDH_score","DAT_score","VMAT2_score","BBB_score",
        "mitochondrial_protection_score"
    ],
    "condensate_partition_proxy": [
        "cpp_aromaticity","cpp_hbond","cpp_amphiphilicity","cpp_charge_penalty",
        "cpp_cation_pi","cpp_pi_pi","cpp_relative_score"
    ]
}

Minimal RDKit + 3D + EHT pipeline

Computed locally from standardized SMILES using RDKit conformers, UFF minimization, FreeSASA, and RDKit’s extended Hückel module. So the HOMO/LUMO numbers below are approximate EHT values, not DFT. Full pKa-state free energies and proper FEP still need a pKa enumerator plus xTB/ORCA or equivalent, because chemistry enjoys making simple things annoying.

from rdkit import Chem
from rdkit.Chem import AllChem, Crippen, Descriptors, Lipinski
from rdkit.Chem import rdMolDescriptors, rdDistGeom, rdFreeSASA, rdEHTTools, DataStructs
import math, pandas as pd

DOPAMINE_FAMILY = {
    "dopamine": "NCCc1ccc(O)c(O)c1",
    "DOPAC": "O=C(O)Cc1ccc(O)c(O)c1",
    "DOPAL": "O=CCc1ccc(O)c(O)c1",
    "3-MT": "COc1cc(CCN)cc(O)c1",
    "HVA": "COc1cc(CC(=O)O)cc(O)c1",
    "amphetamine": "CC(N)Cc1ccccc1",
}

MELATONIN_FAMILY = {
    "Melatonin": "COc1ccc2[nH]cc(CCNC(C)=O)c2c1",
    "NAS": "CC(=O)NCCc1c[nH]c2cc(O)ccc12",
    "6-OHM": "COc1cc2c(CCNC(C)=O)c[nH]c2cc1O",
    "2-OHM": "COc1ccc2[nH]c(O)c(CCNC(C)=O)c2c1",
    "4-OHM": "COc1ccc2[nH]cc(CCNC(C)=O)c2c1O",
    "c3OHM": "COc1ccc2c(c1)C1(O)CCN(C(C)=O)C1N2",
    "AFMK": "COc1ccc(NC=O)c(C(=O)CCNC(C)=O)c1",
    "AMK": "COc1ccc(N)c(C(=O)CCNC(C)=O)c1",
    "5-MT": "COc1ccc2[nH]cc(CCN)c2c1",
    "5-MTOL": "COc1ccc2[nH]cc(CCO)c2c1",
    "5-methoxyindoleacetaldehyde": "COc1ccc2[nH]cc(CC=O)c2c1",
    "5-MIAA": "COc1ccc2[nH]cc(CC(=O)O)c2c1",
    "6-sulfatoxymelatonin": "COc1cc2c(CCNC(C)=O)c[nH]c2cc1OS(=O)(=O)O",
    "NAS sulfate": "CC(=O)NCCc1c[nH]c2cc(OS(=O)(=O)O)ccc12",
}

def embed_confs(mol, n=20):
    mh = Chem.AddHs(Chem.Mol(mol))
    p = rdDistGeom.ETKDGv3()
    p.randomSeed = 0xF00D
    p.pruneRmsThresh = 0.35
    conf_ids = rdDistGeom.EmbedMultipleConfs(mh, numConfs=n, params=p)
    es = []
    for cid in conf_ids:
        AllChem.UFFOptimizeMolecule(mh, confId=cid, maxIters=500)
        ff = AllChem.UFFGetMoleculeForceField(mh, confId=cid)
        es.append((cid, ff.CalcEnergy()))
    es.sort(key=lambda x: x[1])
    return mh, es

def boltz_weights(energies, T=298.15):
    R = 0.0019872041  # kcal/mol/K
    e0 = energies[0][1]
    ws = {cid: math.exp(-(e - e0)/(R*T)) for cid, e in energies}
    Z = sum(ws.values())
    return {cid: w/Z for cid, w in ws.items()}

ptable = Chem.GetPeriodicTable()

def calc_sasa(mh, cid):
    radii = [ptable.GetRvdw(a.GetAtomicNum()) for a in mh.GetAtoms()]
    return rdFreeSASA.CalcSASA(mh, radii, confIdx=cid)

def features(name, smi, ref_fp=None):
    mol = Chem.MolFromSmiles(smi)
    fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)
    row = {
        "name": name,
        "smiles": Chem.MolToSmiles(mol),
        "formula": rdMolDescriptors.CalcMolFormula(mol),
        "exact_mw": Descriptors.ExactMolWt(mol),
        "logP": Crippen.MolLogP(mol),
        "TPSA": rdMolDescriptors.CalcTPSA(mol),
        "HBD": Lipinski.NumHDonors(mol),
        "HBA": Lipinski.NumHAcceptors(mol),
        "rot_bonds": Lipinski.NumRotatableBonds(mol),
        "aromatic_rings": Lipinski.NumAromaticRings(mol),
        "frac_Csp3": rdMolDescriptors.CalcFractionCSP3(mol),
        "formal_charge": Chem.GetFormalCharge(mol),
    }
    if ref_fp is not None:
        row["tanimoto_vs_parent"] = DataStructs.TanimotoSimilarity(fp, ref_fp)

    mh, es = embed_confs(mol)
    bw = boltz_weights(es)
    row["min_conf_energy_kcal"] = es[0][1]
    row["Rg_A"] = sum(w * rdMolDescriptors.CalcRadiusOfGyration(mh, confId=cid) for cid, w in bw.items())
    row["SASA_A2"] = sum(w * calc_sasa(mh, cid) for cid, w in bw.items())

    ok, res = rdEHTTools.RunMol(mh, es[0][0])
    if ok:
        orb = list(res.GetOrbitalEnergies())
        homo = int(res.numElectrons // 2 - 1)
        row["HOMO_eV"] = float(orb[homo])
        row["LUMO_eV"] = float(orb[homo + 1])
        row["gap_eV"] = row["LUMO_eV"] - row["HOMO_eV"]
    return row

parent = Chem.MolFromSmiles(MELATONIN_FAMILY["Melatonin"])
parent_fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(parent, 2, nBits=2048)

rows = [features(n, s, ref_fp=parent_fp) for n, s in MELATONIN_FAMILY.items()]
df = pd.DataFrame(rows)
df.to_csv("melatonin_family_features.csv", index=False)
print(df[["name","logP","TPSA","HOMO_eV","LUMO_eV","gap_eV","tanimoto_vs_parent"]])

Cytoprotective carry-through across melatonin metabolites

Conservatively, six downstream metabolites still look meaningfully protective rather than dead-end waste: NAS, 6-OHM, AFMK, AMK, 5-MT, and cyclic 3-OHM. A generous count makes it seven by including 4-OHM as a likely contributor. The weak link is 2-OHM, which seems much less convincingly protective, and the sulfates/glucuronides mostly behave like clearance products. (PMC)

The strongest direct mammalian-cell evidence is for NAS, 6-OHM, AFMK, and 5-MT, all of which reduced ROS and improved UVB-stressed keratinocyte outcomes; NAS also has separate neuroprotective and neurogenic literature. 6-OHM has additional anti-apoptotic mitochondrial evidence, AFMK attenuated oxidative damage to DNA, proteins, and lipids, AMK is strongly antioxidant and inhibits nNOS, and cyclic 3-OHM protects cytochrome c from oxidative injury. 4-OHM looks chemically useful as a radical-trapping metabolite, whereas 2-OHM contributed much less in antioxidant analyses and can even be ROS-promoting in some plant contexts. (PMC)

I did not split out every transient radical adduct or every glucuronide stereoisomer separately. For the numerically messy glucuronides, the biological conclusion is the same as for the sulfates: they sit mostly in the terminal conjugate / excretion bucket rather than the still-doing-interesting-cytoprotection bucket. (PMC)

Local computed summary for the melatonin family

These values are from the local RDKit/EHT pass above. EHT_gap is an approximate orbital gap. Tanimoto is Morgan/ECFP-style similarity to melatonin. CPP_rel is my relative condensate-partition proxy within this metabolite set, not a universal LLPS constant.

name verdict logP TPSA EHT_gap tanimoto_vs_melatonin condensate_partition_proxy
NAS yes 1.55 65.1 2.92 0.596 0.871
5-MT yes 1.68 51.0 3.08 0.614 0.863
5-MTOL unclear 1.71 45.2 3.09 0.651 0.856
Melatonin parent 1.86 54.1 3.05 1.000 0.855
5-methoxyindoleacetaldehyde unlikely 1.92 42.1 2.33 0.565 0.815
6-OHM yes 1.56 74.4 2.92 0.646 0.814
2-OHM weak 1.56 74.4 2.83 0.612 0.814
4-OHM probable 1.56 74.4 2.90 0.580 0.814
5-MIAA unclear/low 1.80 62.3 2.82 0.622 0.770
AMK yes/probable 0.99 81.4 2.42 0.471 0.687
c3OHM yes/probable 0.89 61.8 3.08 0.270 0.686
AFMK yes 0.97 84.5 2.51 0.411 0.674
NAS sulfate mostly no 1.03 108.5 2.94 0.608 0.649
6-sulfatoxymelatonin mostly no 1.04 117.7 2.94 0.585 0.607

Reading that table without lying to yourself

The protective cascade clearly survives oxidation. Ring-opened products AFMK and AMK do not look like inert trash. Cyclic 3-OHM also survives the transformation with strong antioxidant character. The more terminal conjugates get much more polar and slide toward excretion chemistry instead of broad intracellular action. (PubMed)

The condensate-partition proxy puts NAS, 5-MT, melatonin, and nearby methoxyindoles near the top because they keep the useful “aromatic + amphiphilic + not too charged” balance. AFMK/AMK fall because ring opening and added polarity blunt π-driven partitioning, while sulfates get punished for charge and very high polar surface area. That does not make AFMK/AMK unimportant biologically. It just means “likely less happy inside generic aromatic/hydrophobic condensates” is not the same thing as “not protective.” (PMC)

The next rational upgrade is microstate-aware pKa/logD enumeration plus xTB or DFT on each metabolite, because the neutral-cartoon approximation lies hardest for 5-MT and the sulfated/glucuronidated endpoints.