=========================================== Example Usage =========================================== User-Friendly examples to ease understanding and usage of `fairmofsyncondition` materials to pytorch geometric -------------------------------------------------------- **ase to pytorch geometric** create a pytorch geometric data from a structure file. It can either be periodic system or molecular systems and should be ASE-readable. .. code-block:: python from fairmofsyncondition.read_write import coords_library py_data = coords_library.ase_to_pytorch_geometric(input_system) **NOTE** **input_system** can either be an ase_atom object or a filename (like cif or xyz) **pytorch geometric to ase** Another useful tool is to convert from the pytorch geometric data back to ase_atom .. code-block:: python from fairmofsyncondition.read_write import coords_library ase_atom = coords_library.pytorch_geometric_to_ase(py_data) create and save dataset to lmdb --------------------------------- Given a folder containing a series of cif files for which one wishes to convert to pytorch geometric dataset. The code below can create this dataset and directly save in and lmdb file format in a memory effecient manner. **standard create and save** The below command is an effecient way to quickly create a new pytorch gemeotric dataset and directly save to disc. .. code-block:: python import pickle import lmdb import torch from fairmofsyncondition.read_write import coords_library path_to_cif = 'folder_containing' path_to_lmdb = 'data.lmdb' count = 0 with lmdb.open(path_to_lmdb , map_size=int(1e12)) as lmdb_env: with lmdb_env.begin(write=True) as txn: for i, filenames in enumerate(path_to_cif): py_data = coords_library.ase_to_pytorch_geometric(filenames) txn.put(f"{i}".encode(), pickle.dumps(py_data )) count += 1 txn.put(b"__len__", pickle.dumps(count)) The above code creates `data.lmdb` file containing the pytorch geometric data. **Create, add properties and save** It is possible to add any property to the structure, like energy, forces, hessians even categorical data. The below code is a snippet of how this can be archeived .. code-block:: python import pickle import lmdb import torch from fairmofsyncondition.read_write import coords_library from fairmofsyncondition.featurizer import encoder path_to_cif = 'folder_containing' path_to_lmdb = 'data.lmdb' list_of_energies = [...] list_of_hessians = [...] list_of_forces = [...] categories = [...] list_of_list_of_catagories = [[...], [...] ...[...]] count = 0 with lmdb.open(path_to_lmdb , map_size=int(1e12)) as lmdb_env: with lmdb_env.begin(write=True) as txn: for i, filenames in enumerate(path_to_cif): py_data = coords_library.ase_to_pytorch_geometric(filenames) py_data.energy = torch.tensor(list_of_energies[i], dtype=torch.float16) py_data.hessians = torch.tensor(list_of_hessians[i], dtype=torch.float16) py_data.forces = torch.tensor(list_of_forces[i], dtype=torch.float16) py_data.category_name = encoder.onehot_encoder_pyg(list_of_list_of_catagories[i], categories) txn.put(f"{i}".encode(), pickle.dumps(py_data )) count += 1 txn.put(b"__len__", pickle.dumps(count)) The above code will create pytorch geometric dataset and save to `data.lmdb`. reading lmdb pytorch dataset ----------------------------- The code below provides a memory efficient way to load the dataset with consuming so much memory as well as an efficient way to split data .. code-block:: python from fairmofsyncondition.read_write import coords_library path_to_mdb = 'data.lmdb' data = coords_library.LMDBDataset(lmdb_path=path_to_mdb) # check all methods available print(dir(data)) # print for energy print(data[0]) # split data train_data, test_data = data.split_data(train_size=0.8, random_seed=42, shuffle=True) cheminformatics --------------------------- You can use `fairmofsyncondition` to quickly convert from `iupac names` to `iupac identifiers` and vice versa. One can also convert `chemical structures` to `iupac names` and `iupac identifiers` by following the these examples. iupacname2cheminfo ------------------------------- This function extracts SMILES strings, InChIKey, and InChI from a correctly written IUPAC name or common name. .. code-block:: python from fairmofsyncondition.read_write import iupacname2cheminfo data = iupacname2cheminfo.name_to_cheminfo("ethanol") print(data) cheminfo2iupac ------------------------ This function determines the IUPAC name from a cheminformatic identifier (SMILES, InChI, InChIKey, or CID). If the indentifier is a SMILES then the name_type should be "smile", if it is an InChIKey then the name_type should be "inchikey". .. code-block:: python from fairmofsyncondition.read_write import cheminfo2iupac name_info = cheminfo2iupac.pubchem_to_inchikey('O', name='smile') print("IUPAC name from SMILES 'O':", name_info) name_info2 = cheminfo2iupac.pubchem_to_inchikey('ZNALFCQVQALKNH-UHFFFAOYSA-N', name='inchikey') print("IUPAC name from INCHIKEY 'ZNALFCQVQALKNH-UHFFFAOYSA-N':", name_info2) struct2iupac ------------------------ This function extracts the IUPAC name and cheminformatic identifiers from a structure file. It parses any ASE-readable file and computes the corresponding cheminformatic information and iupac name. .. code-block:: python from fairmofsyncondition.read_write import struct2iupac struct_info = cheminfo2iupac.pubchem_to_inchikey(filename) print("Cheminformatic info from structure file:", struct_info) Command Line Usage ------------------ The cheminformatic data can also be executed directly from the command line. For example: - To convert an IUPAC name to cheminformatic information: .. code-block:: bash iupac2cheminfor -n "ethanol" - To determine the IUPAC name from a cheminformatic identifier: .. code-block:: bash cheminfo2iupac -n "O" - To extract information from a structure file: .. code-block:: bash struct2iupac example_structure.xyz