Both the automated generation of reaction networks and the automated prediction of synthetic trees require, in one way or another, the definition of possible transformations a molecule can undergo. One way of doing this is by using reaction templates. In view of the expanding amount of known reactions, it has become more and more difficult to envision all possible transformations that could occur in a studied system. Nonetheless, most reaction network generation tools rely on user-defined reaction templates. Not only does this limit the amount of chemistry that can be accounted for in the reaction networks, it also confines the wide-spread use of the tools by a broad public. In retrosynthetic analysis, the quality of the analysis depends on what percentage of the known chemistry is accounted for. Using databases to identify templates is therefore crucial in this respect. For this purpose, an algorithm has been developed to extract reaction templates from various types of chemical databases. Some databases such as the Kyoto Encyclopedia for Genes and Genomes and RMG do not report an atom-atom mapping (AAM) for the reactions. This makes the extraction of a template non-straightforward. If no mapping is available, it is calculated by the Reaction Decoder Tool (RDT). With a correct AAM-either calculated by RDT or specified-the algorithm consistently extracts a correct template for a wide variety of reactions, both elementary and non-elementary. The developed algorithm is a first step towards data-driven generation of synthetic trees or reaction networks, and a greater accessibility for non-expert users.
Download Full PDF Version (Non-Commercial Use)