Research findings are mainly reported in the form of prose and semi-structured data. This allows rich and subtle details to be communicated to human readers but limits comprehension by computer programs (machines). This has led to a poverty of riches from an informational perspective. The growing number of publications and data makes it challenging to stay abreast of the growing knowledge in one’s field, let alone in related ones. Using keywords and controlled terminologies to index articles and data is somewhat helpful in locating relevant information, but undermined by several false positives. This still leaves one with the problem of having to sift through a large volume of information, and then having to expend time in reading and assimilating the reported results. Text-mining applications that automatically extract information have been suggested as being the solution to this problem. However, these perform at a level well below human precision. What if there was a way to summarize the salient findings of a significant portion of nascent research discoveries in a way that was comprehensible by both scientists and computer programs? This would have two important consequences. Firstly, this would facilitate the retrieval of the information of interest with unprecedented specificity. Additionally, this would result in the development of a trove of accruing knowledge from cutting-edge research, independent of the corpus of published literature. This could per se be the subject of analysis and study, helping to find answers to important questions in semi-automatic fashion, and suggest possible directions for future research. MachineProse is an ontological framework of scientific assertions that makes the above possible. This occupies the pragmatic middle ground between scalable but semantically deficient approaches (e.g., keyword tagging) and semantically rich ontology languages for the web that do not scale well. The proposed approach hinges on representing the main conclusions of research in a context of ontologies, while eschewing peripheral details. This functions as a semantic index because it facilitates highly specific queries for the location of relevant data on one hand, and in itself serves as a repository of knowledge on the other.
Dr. Deendayal Dinakarpandian is an Assistant Professor in the Department of Computer Science and Electrical Engineering at the School of Computing and Engineering in UMKC. His research is focused on developing and using methods to solve problems of biological and medical relevance. In particular, he is interested in domain-intensive machine learning methods and pragmatic models for knowledge representation.