A new unbiased and highly automated approach to find new prognostic markers in preclinical research.
Data acquisition in (pre)clinical studies is often based on a hypothesis. Numerical algorithms, however, may help to find biomarkers from existing data without formulating any hypothesis. By simply assessing whether a statistical relationship exists between two parameters from a (unlimited) database, every (in)conceivable combination of data becomes a hypothesis. The aim was to create an unbiased and highly automated approach for secondary analysis of (pre)clinical research, including the possibility of a non-linear functional relationship. In our example, an almost homogeneous database was formed by overall 45 parameters (vital, blood and plasma parameters) measured in 11 individual experimental studies at 6 different time points using 57 rats without and 63 rats with systemic inflammation following lipopolysaccharide infusion. For each rat, four group classifiers (treatment, survival, study, ID) were used to get valid samples by a later filtering of the statistical base. Any information about the hypothesis leading to the respective studies was suppressed. In order to assess whether a statistical relationship exists, a total of six different functional prototypes (linear and non-linear) were postulated and examined for their regression. Regression quality, correlation and significance were obtained in form of matrices. In our example, ultimately 510 300 regressions were optimized, automatically evaluated and filtered. The developed algorithm is able to reveal statistical relationships from a nearly crude database with low effort by systematic and unbiased analysis. The finding of well-known correlations proves its reliability, whose validity could be increased by clean aggregation of different studies. In addition, new interesting hints for future research could be gained. Thus, unknown markers can be found which are associated with an increased risk of death during systemic inflammation and sepsis. A further development of the program is planned including multiple regressions (more than two parameters could be related to each other) or cluster analysis.