CODESSA 2.7.10 for Unix - Automation Utility Notes Authors: Alan Katritsky, Mati Karelson, Victor S. Lobanov, Roy Dennington and Todd Keith Copyright (c) 1994-1995 by University of Florida Portions Copyright (c) 1996-2007 by Semichem, Inc. All Rights Reserved Redistribution or resale, except by Semichem, Inc., is prohibited. Some command-line utilities and features have been added to CODESSA (and AMPAC)to help automate the processes of generating and loading AMPAC, GAUSSIAN and MOPAC data, calculating descriptors, using correlations in CODESSA Project files to make predictions on data files, exporting descriptors to a file, etc. In order to facilitate automatic generation and loading of AMPAC data, the following conventions are used for filename extensions: AMPAC input and output data files: *.dat *.out AMPAC "THERMO" input and output data files: *_f.dat *_f.out AMPAC Backup data files: *.bk Ordinary AMPAC version control using '~' is not allowed. All scripts act on files without versions. In order to facilitate automatic generation and loading of GAUSSIAN (g94 or g98 or g03) data, the following conventions are used for filename extensions: GAUSSIAN input and output files: *.com *.log GAUSSIAN "THERMO" input and output data files: *_f.com *_f.log GAUSSIAN Backup data files: *.bk In order to facilitate automatic loading of MOPAC data (CODESSA does not support automatic generation of MOPAC data - i.e., it is up to the user to properly setup the keywords in the MOPAC input data files and run MOPAC), the following conventions are required: MOPAC output data files: *.mno MOPAC "THERMO" output data files: *_f.mno MOPAC Backup data files: *.bk ****************************************************************************** NOTE: Mixing of AMPAC, GAUSSIAN or MOPAC data is not recommended. ****************************************************************************** ****************************************************************************** NOTE: Several examples of using the 'predict' module for AMPAC, GAUSSIAN and MOPAC data files are given at the bottom of this file. Simply following these examples will cover the vast majority of automation needs. But read on if you wish ... ****************************************************************************** Existing data files can be renamed by using "find" with the "exec" option (see the MakeInp*.csh scripts in $CODESSA_DIR/bin/). WARNING: These automation utilities WILL modify the existing AMPAC or GAUSSIAN data files - but not MOPAC. Whenever possible, they will move the original files to filename.bk. Thus, we highly recommend backing up your files before executing any of these utilities. Commands and Utilities: The commands are defined in "init_codessa.csh" or "init_codessa.sh" and "init_ampac.csh" or "init_ampac.sh" as aliases to the scripts at the locations given below. If the names conflict with any existing commands, simply modify the entries in the init files. Note, any interdependencies referenced in the scripts use the paths as described by "Location", not the aliases. Command Location Type ------------------------------------------------------------------------------ arc2dat "$AMPAC_DIR/bin/Arc2Dat.awk" [awk script] (AMPAC) mkinp "$CODESSA_DIR/bin/MakeInp.csh" [csh script] (AMPAC) mkinp_g "$CODESSA_DIR/bin/MakeInp_g.csh" [csh script] (GAUSSIAN) mkinp_m "$CODESSA_DIR/bin/MakeInp_m.csh" [csh script] (MOPAC) gencodentry "$CODESSA_DIR/bin/GenCodEntry.csh" [csh script] (AMPAC) gencodentry_g "$CODESSA_DIR/bin/GenCodEntry_g.csh" [csh script] (GAUSSIAN) gencodentry_m "$CODESSA_DIR/bin/GenCodEntry_m.csh" [csh script] (MOPAC) codkey "$CODESSA_DIR/bin/codkey.exe" [C program] (AMPAC, GAUSSIAN) "$CODESSA_DIR/bin/codkey.c" [C source] (AMPAC, GAUSSIAN) predict "$CODESSA_DIR/predict.exe" [binary exe] (AMPAC, GAUSSIAN, MOPAC) CODESSA Menu Items for automation: "Data->Auto Load..." "Descriptor->Calculate & Predict..." -------------------------------------------------------------------- General Description ("Data->Auto Load..." menu item) User Input: [1] Select Filelist Dialog [2] Select Descriptors Dialog [3] Select Correlation Dialog Automatically generate structure and descriptor data from an AMPAC or GAUSSIAN input data file list, descriptor data from a MOPAC output data file list and optionally apply a correlation to make a property prediction. The user will be presented with three dialogs before any calculating begins. The user can cancel at any time. Dialog [1] presents a standard file selection dialog. The user must choose a "filelist" file that contains a list of files and directories to be "autoloaded" - input data files for AMPAC and GAUSSIAN, output data files for MOPAC. This is the only piece of external data required. It must be generated by the user, either manually or with the "find" command. Be careful with the "find" command. If you decide to act on individual files rather than directories, DO NOT include both directory and the files it contains. Specify one or the other but not both. For example, a typical filelist would contain a single file or directory on each line. Comments are not allowed: Example list file for AMPAC: /usr/tmp/b001.dat /usr/tmp/b002.dat /usr/tmp/b003.dat /usr/tmp/a Example list file for GAUSSIAN: /usr/tmp/b001.com /usr/tmp/b002.com /usr/tmp/b003.com /usr/tmp/a ... Example list file for MOPAC: /usr/tmp/b001.mno /usr/tmp/b002.mno /usr/tmp/b003.mno /usr/tmp/a ... Do not also specify "/usr/tmp" in the above examples. If you do, each specified file will be acted on twice. The paths can be relative to the location of the listfile. This is natural if you use "find". For AMPAC, list a single .dat file for each structure. DO NOT specify a THERMO file or any .out files. Using the convention given above, the program will either locate or generate the needed output (.out) AMPAC files. For GAUSSIAN, list a single .com file for each structure. Using the convention given above, the program will either locate or generate the needed output (.log) GAUSSIAN data files. For MOPAC, list a single .mno file for each structure. Do not specify a THERMO file or any input files. Dialog [2] allows you to pick which descriptors will be calculated for each file found. Dialog [3] is optional. It requests a prediction using a specified correlation. Implementation note: AMPAC: For each AMPAC file or directory found in the filelist, the program will execute "mkinp file|directory tmp.inp" and then load "tmp.inp" into a Temporary Set. Descriptors get calculated, and predictions are done. GAUSSIAN: For each GAUSSIAN file or directory found in the filelist, the program will execute "mkinp_g file|directory tmp.inp" and then load "tmp.inp" into a Temporary Set. Descriptors get calculated, and predictions are done. MOPAC: For each MOPAC output file (or directory containing MOPAC output files) found in the filelist, the program will execute "mkinp_m file|directory tmp.inp" and then load "tmp.inp" into a Temporary Set. Descriptors get calculated, and predictions are done. CODESSA uses a Temporary Set to Auto-Load structures; however, the new structures can be accessed from the set, "All Structures". Also note, the new structures and descriptors will be retained when you issue a "Save" command. -------------------------------------------------------------------- General Description ("Descriptor->Calculate & Predict..." menu item) User Input: [1] Select Set Dialog [2] Select Descriptors Dialog [3] Select Correlation Dialog Given a Set, Calculate Descriptors and use a Correlation to predict a property for each member in the set. -------------------------------------------------------------------- General Description (mkinp) AMPAC Usage: mkinp [-r] [-o] parent_directory|ampdatfile codinpfile Make a CODESSA input file suitable for loading by the "Data->Load..." menu item, and generate any missing AMPAC Data. Note, you can specify either a AMPAC input data file or a directory of such files as input. If you specify a directory, "mkinp" will search for any AMPAC input data files with a .dat extension (excluding files that match *_f.dat if -o is specified), and run "gencodentry -k" on each file. The resulting codinpfile will contain a HEADER line followed by an entry for each file found. If the recursive option [-r] is specified, the find will descend the full directory tree, looking for AMPAC input data files. If you specify a file, "mkinp" will run "gencodentry -k" on that file and also generate an codinpfile with only a HEADER and one entry. Note, any existing AMPAC data will be kept. If you want to force an AMPAC calculation, clean out the old AMPAC .arc files and .out files BEFORE running this script. -o = Generate separate SCF and THERMO data files for AMPAC instead of a single data file (with both SCF and THERMO data) generated from an input file with the CODESSA keyword (recognized by AMPAC-6.5 and later and CODESSA-2.6 and later). -------------------------------------------------------------------- General Description (mkinp_g) GAUSSIAN Usage: mkinp_g [-r] [-n] parent_directory|gausscomfile codinpfile Make a CODESSA input file suitable for loading by the "Data->Load..." menu item, and possibly generate any missing GAUSSIAN Data. Note, you can specify either a GAUSSIAN .com file or a directory of such files as input. If you specify a directory, "mkinp_g" will search for any files with a .com extension and run "gencodentry_g -k" on each file. The resulting codinpfile will contain a HEADER line followed by an entry for each file found. If you specify a file, "mkinp_g" will run "gencodentry -k" on that file and also generate an codinpfile with only a HEADER and one entry. If the recursive option [-r] is specified, the find will descend the full directory tree, looking for comfiles. If the [-n] option is specified, GAUSSIAN will not be executed for the found and potentially modified comfiles, even if no .log file exists. It is then up to the user to run GAUSSIAN on the comfiles later. If you specify a file, "mkinp_g" will "run gencodentry_g -k" on that file and also generate an codinpfile with only a HEADER and one entry. Note, any existing GAUSSIAN data (in .log files) will be kept. If you want to force a GAUSSIAN Calculation, clean out the old GAUSSIAN .log files BEFORE running this script. -------------------------------------------------------------------- General Description (mkinp_m) MOPAC Usage: mkinp_m [-r] [-o] parent_directory|mopacmnofile codinpfile Make a CODESSA Input file suitable for loading by the "Data->Load..." menu item. Note, you can specify either a MOPAC mnofile (MOPAC output data file) or a directory of such files as input. If you specify a directory, "mkinp_m" will search for any MOPAC output data files with a .mno extension (excluding files that match *_f.dat if -o is specified), and run "gencodentry_m" on each file. The resulting codinpfile will contain a HEADER line followed by an entry for each file found. If the recursive option [-r] is specified, the find will descend the full directory tree, looking for mnofiles. If you specify a file, "mkinp_M" will run "gencodentry_m" on that file and also generate a codinpfile with only a HEADER and one entry. -o = Assume separate SCF and THERMO MOPAC output files instead of a single SCF output file -------------------------------------------------------------------- General Description (gencodentry) Usage: gencodentry [-k] [-o] ampdatfile codinpfile Generate AMPAC output data file from an AMPAC input data file and add a corresponding CODESSA INPUT entry to codinpfile. Although this script can be called separately, it is primarily used by "mkinp" to generate a CODESSA Input file suitable for loading by the "Data->Load..." menu item. The keep option [-k] is optional. Specify the keep option to use existing AMPAC data files if they exist. This allows you to create a CODESSA input data file without repeating the AMPAC calculations. Note, the script DOES NOT check for the correct keywords in the existing files. Run gencodentry when you want to generate AMPAC data suitable for loading by CODESSA for a single structure. -o = Generate separate SCF and THERMO columns for the CODESSA Input files and separate SCF and THERMO AMPAC data files instead of a single AMPAC data file with all info from the CODESSA keyword (recognized by AMPAC-6.5 and later and CODESSA-2.6 and later). -------------------------------------------------------------------- General Description (gencodentry_m) Usage: gencodentry_m [-o] mopacmnofile codinpfile For the MOPAC output file mnofile, add a corresponding CODESSA INPUT entry to codinpfile. Although this script can be called separately, it is primarily used by "mkinp_m" to generate a CODESSA Input file suitable for loading by the "Data->Load..." menu item or by predict.exe. Run gencodentry_m when you want to use MOPAC output data suitable for loading by CODESSA for a single structure. -o = Generate separate SCF and THERMO columns for the CODESSA Input file based on separate SCF and THERMO MOPAC output files for a given structure instead of a single MOPAC SCF output file. -------------------------------------------------------------------- General Description (gencodentry_g) Usage: gencodentry_g [-k] [-n] gausslogfile codinpfile Generate GAUSSIAN .log file from a .com file and add a corresponding CODESSA INPUT entry to codinpfile. Although this script can be called separately, it is primarily used by "mkinp_g" to generate a CODESSA Input file suitable for loading by the "Data->Load..." menu item. The keep option [-k] is optional. Specify the keep option to use existing GAUSSIAN calculations if they exist. This allows you to create a CODESSA input file without repeating the GAUSSIAN calculations. Note, the script DOES NOT check for the correct keywords in the existing files. If the [-n] option is specified, GAUSSIAN will not be executed for the .com file, even if no log file exists. It is then up to the user to run GAUSSIAN on the comfile later. Run gencodentry_g when you want to generate GAUSSIAN data suitable for loading by CODESSA for a single structure. -------------------------------------------------------------------- General Description (arc2dat) Usage: arc2dat arcfile > datfile Create an AMPAC .dat file using the keywords and geometry from an AMPAC .arc file. This simple awk script is used by "gencodentry" to create a THERMO data file from a resulting SCF archive file. -------------------------------------------------------------------- General Description (codkey) Usage: codkey SCF|THERMO|CODESSA|CODSCF|CODTHERMO old.dat > new.dat Create a new AMPAC data file that contains all required CODESSA keywords for either an SCF, THERMO or an SCF+THERMO calculation. Any old CODESSA keywords are removed and new keywords are added for the specific type. All other keywords will be carried over to the new .dat file. This utility is called by "gencodentry" to adjust keywords while generating AMPAC Data for CODESSA. See the source code for further details. -------------------------------------------------------------------- General Description (codkey) Usage: codkey -g SCF|THERMO Create a new GAUSSIAN .com file that contains all required keywords for generating GAUSSIAN data (.log) files for CODESSA. This utility is called by "gencodentry_g" to adjust keywords while generating GAUSSIAN Data for CODESSA. A comment line "!Codessa_Ready" is added to a .com file modified by codkey_g.exe. This line should not be removed. See the source code (codkey.c) for further details. -------------------------------------------------------------------- Description of predict.exe Purpose: predict.exe can be used to automatically predict properties for a set of output data files using a specified correlation in a specified CODESSA Project file (*.cod). It can also be used to export (write to a file) descriptors for a set of output data files or for all structures in a specified CODESSA Project file. Usage: predict [-l] [-s] [-g] [-o] [-d] [-c] [-m] [-r] [-n] [-f] \ \ inputdatafile|outputdatafile|directory|listfile AMPAC Examples: (Predict the property corresponding to the correlation "alcbpcorr" in the CODESSA Project file "alcohols.cod" from the descriptors derived from the AMPAC output data file "butanol.out", which will be generated by running AMPAC on "butanol.dat" if "butanol.out" does not already exist.) % predict -f alcohols.cod alcbpcorr butanol.dat (Predict the property corresponding to the correlation "alcbpcorr" in the CODESSA Project file "alcohols.cod" from the descriptors derived from the existing AMPAC output data file "butanol.out".) % predict -f alcohols.cod alcbpcorr butanol.out (Predict the property corresponding to the correlation "alcbpcorr" in the CODESSA Project file "alcohols.cod" for each of the AMPAC output data files corresponding to AMPAC input data files in the directory "alcohols". If an AMPAC output data file does not exist for a given AMPAC input data file, run AMPAC to generate it, with frequency data) % predict -f alcohols.cod alcbpcorr alcohols (Predict the property corresponding to the correlation "alcbpcorr" in the CODESSA Project file named alcohols.cod for each of the AMPAC output data files corresponding to AMPAC input data files in the directory "alcohols" and its subdirectories. If an AMPAC output data file does not exist for a given AMPAC input data file, run AMPAC to generate it, with frequency data) % predict -f -r alcohols.cod alcbpcorr alcohols (Export all descriptors derived from the AMPAC output data file corresponding to "butanol.dat" to the export file "methanol.des". The AMPAC output data file "butanol.out" will be generated, with frequency data, by running AMPAC on "butanol.dat" if it does not already exist.) % predict -f -d methanol.des methanol.dat (Export all descriptors derived from the existing AMPAC output data file "butanol.out" to the export file "methanol.des".) % predict -d methanol.des methanol.out (Export all descriptors derived from AMPAC output data files corresponding to AMPAC input data files in the directory "alcohols" to the export file "alcohols.des". For a given AMPAC input data file, an AMPAC output data file will be generated, with frequency data, by running AMPAC on the input data file if the AMPAC output data file does not already exist) % predict -d alcohols.des alcohols (Export all descriptors derived from AMPAC output data files corresponding to AMPAC input data files in the directory named alcohols and its subdirectories to the export file named alcohols.des. For a given Ampac input data file, an AMPAC output data file will be generated, with frequency data, by running AMPAC on the input data file if the AMPAC output data file does not already exist.) % predict -d -r alcohols.des alcohols (Export all descriptors in the CODESSA Project file "alcohols.cod" to the export file "alcohols.des") % predict -d -c alcohols.cod alcohols.des GAUSSIAN Examples: (Similar to AMPAC examples except add -g argument and substitute appropriate file extensions. For the descriptions above, substitute GAUSSIAN for AMPAC.) % predict -f -g alcohols.cod alcbpcorr butanol.com % predict -f -g alcohols.cod alcbpcorr butanol.log % predict -f -g alcohols.cod alcbpcorr alcohols % predict -f -g -r alcohols.cod alcbpcorr alcohols % predict -f -d -g methanol.des methanol.com % predict -d -g methanol.des methanol.log % predict -d -g alcohols.des alcohols % predict -d -g -r alcohols.des alcohols % predict -d -g -c alcohols.cod alcohols.des MOPAC Examples: (Similar to AMPAC examples except add -m argument and substitute appropriate file extensions. For the descriptions above, substitute MOPAC for AMPAC.) % predict -f -m alcohols.cod alcbpcorr butanol.dat % predict -f -m alcohols.cod alcbpcorr butanol.out % predict -f -m alcohols.cod alcbpcorr alcohols % predict -f -m -r alcohols.cod alcbpcorr alcohols % predict -f -d -m methanol.des methanol.dat % predict -d -m methanol.des methanol.out % predict -d -m alcohols.des alcohols % predict -d -m -r alcohols.des alcohols % predict -d -m -c alcohols.cod alcohols.des -l = target files or directories are specified in a list file, one file or directory per line. -s = save the new structures and descriptors to the codprojfile (not applicable if -d is specified without -c, i.e., if no codprojfile is specified) -g = Data files are GAUSSIAN (Default is AMPAC) -o = Search for or generate separate SCF and THERMO data files for AMPAC or GAUSSIAN instead of a single data file with both SCF and THERMO data -d = Export descriptors to a file. Descriptors are calculated for each output data file, or obtained from codprojfile if -c is specified. -c = Used in conjunction with -d, meaning that the descriptors are to be exported from a specified CODESSA Project file instead from a set of output data files. -m = Data files are MOPAC (Default is AMPAC) -r = Recursively search for input data files starting in the target directory -n = Generate a new output data file for each input data file, even if an output data file already exists. -f = Calculate frequency data (and therefore "THERMO") data along with the usual SCF data.