Scientific literature is being published at a rate that we can’t keep up with.
Test mining extracts information from publicied scoreces. Tools like these are becoming essential to keep up with the amount of information.
The increase in this literature is making it increasingly difficultt for researchers to stay up to date and it makes it harder to come up with a meaningful and testable hypothesis.
Test ,omomg os the process of extrecting information in text through techniques such as information retrieval, machine learning, natural language processing, statistics, and computational linguistics.
This slows the time needed to read through the information in literature.
Most research in NER focuses on recognising gene and protein mentions/. Some work has also been done on identifying cell lines.
Dictionary band methods work by matching text against a fixed dictionary of entity names. It is dependent on coverage of dictionary and performance of matching techniques.
Rule based methods use orthographic and morph-syntaitic features to generate patterns and rules. They are very precise but not good at recalling information.
Matching learning are best and have the highest performance levels.
NER is viewed as either a classification or sequence labelling problem. Classification approaches normally consider NER as a assigner. It supports features such as surface clues. and morpho-syntactic features. They only support binary classification. Sequence labeling approaches deduce the most probable sequence of tags.
Lableing has become complicated as a dictionary based system is limited.
The standard method of normalisation is to compare an NE against a dictionary of synonyms and identifiers.
Rule based approaches have been used to try nd normalize terms by applying a set of transformatjons to try and match a term in lexicon.
Swansons ABC model: New hypotheses can emerge and scientific discoveries can be anticipated or stimulated by the investigation of complementary but disjoint literature.