GenRA: Release Notes
Latest Version
GENRA 3.3 - September 2024
GenRA Version 3.3 addresses a number of bug fixes, code quality issues, as well as offering several significant new features, specifically:
- 45 code quality/maintenance/technical debt reduction bug and tweak tickets closed.
- Physical Properties fingerprint - relies on the following features: Hydrogen bond donors, hydrogen bond acceptors, logKow and molecular weight (MW). Predictions are generated from the OPERA suite of models. Intended to be used as part of the custom (hybrid) fingerprint option to return source analogues that are similar with respect to physicochemical properties and chemical structure and/or bioactivity.
- PFAS ToxPrints: PFAS ToxPrints developed by Richard et al., 2023 have been generated for PFAS that are part of the following PFAS list.
- Pesticide RAC fingerprint: Mode of action information as described by the herbicide, fungicide and insecticide resistance action committees. A filter has also been added to return only analogues which are listed in these predefined lists.
- Custom neighborhoods: Ability for users to create their own neighborhoods. These can be created from making selections in the Network Exploration tool or from inserting a list of identifiers. Note: No pairwise similarity is calculated. Each analogue is equally weighted.
- Multi-target: Using a list of analogues as a ‘candidate’ category, users can denote each category member as the target in order to make predictions across the neighborhood. Note: No pairwise similarity is determined. Each member is equally weighted and in cases of inconsistency across the category, the most conservative estimate is carried forward into the genra prediction engine.
- Ability to generate the uncertainty estimates for binary predictions using the genra-py library (Shah et al., 2021) commensurate with the existing genrapred engine.
- Update to the genra-py library to allow predictions to be made using custom (hybrid) fingerprints.
Previous Versions
GENRA 3.2 - March 2023
GenRA Version 3.2 addresses a number of bug fixes, code quality issues as well as offering several new features specifically:
- Over 30 tweaks and minor bug fixes closed.
- Data updates: synced with DSSTox 2022-02-18, invitroDB 3.5 and ToxRefDB 2.1
- Major speed up from improved use of indexing
- A new chemical fingerprint has been added, the AIM CSRML. This is a re-implementation of EPA’s Analog Identification Methodology fragment set but captured in a Chemical Structure Markup Language format. More details of the AIM CSRML are described in the accompanying manuscript published in Computational Toxicology – see https://doi.org/10.1016/j.comtox.2022.100256 for further information.
- New download options from Panel 1. Download from Panel 1 has been enriched to provide not only the Top 100 source analogs and their pairwise similarities but also the associated chemical or biological fingerprint matrix. The fingerprint matrix is provided both as a bitstring field in one column and as additional individual columns.
- New sorting in Panel 4: Data matrix. Observed data and read-across results can now be sorted by number of positives/negatives, confidence in predictions (i.e. AUC and p values).
- New download options from Panel 4. Download from Panel 4 now permits ease of filtering and sorting on the basis of confidence in predictions, observation richness which will facilitate post-processing by end users.
- Neighborhood explorer graph visualization tool. Network visualization has been extended to enable neighborhoods to be viewed without filtering on the basis of ToxCast or ToxRefDB data.
GENRA 3.1
GenRA Version 3.1 addresses a number of bug fixes, code quality issues as well as offering several significant new features specifically:
- 38 bug/tweak tickets closed.
- 40 code quality/maintenance/technical debt reduction tickets closed
- Ability to download the radial plot view and top 100 most similar analogs from Panel 1 in the application. The latter is particularly useful if the use case is simply to return the top 100 substances (DTXSID identifiers and Jaccard similarity scores) without the ToxRefDB filter and query the CompTox Chemicals Dashboard for additional information using the Batch search functionality.
- Physical Properties visualization and reporting. The ability to explore physicochemical similarity across candidate source analogues is now afforded by exploring the distributions of specific properties (properties are predictions from the OPERA software tool (https://github.com/kmansouri/OPERA) – LogKow, Vapour Pressure, Henry’s Law Constant, Melting point, Boiling point, Water Solubility as well as Molecular Weight). These are depicted as a series of boxplots/swarmplots launched as a pop up visualization from within Panel 1. Values are also tabulated in Panel 4’s datamatrix view. The view is intended to provide some additional context for analogue evaluation by exploring to what extent physical property values are consistent and comparable across analogues relative to the target chemical of interest.
- Neighborhood explorer graph visualization tool. This network tool enables a side-by-side exploration of the top 3 source analogues across different fingerprint (FP) types. This is a popup from within Panel 1. Source analogues and their next generation analogues can be compared on the basis of different FPs from one view. The data underpinning the network view can also be downloaded as a json blob object. This permits a user to analyze the network view with other 3rd party tools such as Cytoscape or Networkx.
- Vendor specific ToxCast fingerprints for a subset of vendors including Attagene and Bioseek. Some substances have been well tested across more assays than others – that may overestimate the similarity for chemicals that might have only been tested in a limited set of assays.
- Use of genra-py library library (Shah et al., 2021) to facilitate different data types. This allows for continuous and binary data to be used in the GenRA approach.
- Prediction of continuous values. Prediction of in vivo toxicity potencies are now feasible rather than binary toxicity predictions. The potency information being predicted relies on dose values aggregated from ToxRefDB and making use of the genra-py library. A proof-of-concept data matrix view has been developed to capture potency value ranges.
- Generalizing for multiple data streams. The infrastructure for additional predictions/aggregations in the future has been added. As a first use case, predictions on the basis of ToxCast hit calls is now possible. Panel 1’s radial plot to return those analogues with available ToxCast data outcomes is now possible to seed the subsequent panel views and facilitate an assay level prediction on the basis of chemical fingerprints.