Ontologies are successfully used as semantic guides when navigating through the huge and ever increasing quantity of digital documents. Actually, they constitute the backbone of key functionalities such as indexing, retrieving, filtering and analyzing relevant information in a given context.
OntoToolkit aims at providing applications dedicated to ontology treatments that implement algorithmic solutions that we published.
Category Archives: Softwares
MUD – Multiple Uncertainty Detection
MUD allows to detect uncertainty in natural language. It relies on a new supervised and generic approach based on the statistical analysis of multiple lexical and syntactic features used to characterize sentences through vector-based representations that can be analyzed by proven classification methods (like SVM).
You may found additional content in following publications:
- “Uncertainty detection in natural language: a probabilistic model”. Pierre-Antoine Jean, Sébastien Harispe, Sylvie Ranwez, Patrice Bellot, Jacky Montmain. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (WIMS’16), Rajendra Akerkar, Michel Plantié, Sylvie Ranwez, Sébastien Harispe, Anne Laurent, Patrice Bellot, Jacky Montmain, and François Trousset (Eds.). ACM International Conference Proceeding Series, New York, NY, USA, Article 10, 10 pages. DOI: http://dx.doi.org/10.1145/2912845.2912873, ISBN: 978-1-4503-4056-4, Nîmes, France, June 13-15 2016.
- (in French only) “Un modèle probabiliste pour la détection de l’incertitude dans le langage naturel”. Pierre-Antoine Jean, Sébastien Harispe, Sylvie Ranwez, Patrice Bellot, Jacky Montmain. Actes de CORIA 2016
Source code available on GitHub.
USI – User-oriented Semantic Indexer
User-oriented Semantic Indexer is the name of an efficient algorithm for annotating documents of any type.
The main motivation behind this work is to provide a kNN-based approach for annotating entities, be it textual documents, songs or movies. While other methods often combine machine learning and feature analysis of a given document (e.g. textual features), USI’s approach is completely independent of the document content. The only requirement in order to guarantee an accurate annotation is to provide an accurate already annotated neighborhood. The search of a good neighborhood is an independent task, related to information retrieval, for which an extensive list of tools already exist.
Thanks to the rise of thesauri, ontologies and knowledge representations in general, there are more and more data that can be annotated by concepts. The semantic indexing process has been initiated in the biomedicine field but much more content can now benefit from conceptual indexing thanks to DBPedia or Freebase. USI aims to do so, whatever is the content, whatever is the thesaurus.
USI is presented as a heuristic algorithm optimizing an objective function. We propose an algorithmic optimization of this heuristic to make it fast enough, implemented in the USI java library. This library is also hosted on GitHub and it can be freely downloaded to be implemented in your project.
- Dedicated website
- Source code on GitHub
- Demo on biomedical papers: try out USI by annotating biomedical papers (mostly about tumors)
- Demo on movies: try out USI by annotating movies thanks to Freebase annotations
- Evaluation datasets: contains the L1000 dataset (and 2 others) to evaluate your results – and those of USI
- Results: the results of USI for the L1000 dataset
- USI-MeSH: use this build to reproduce our results
- MeSH-validation: use this build to validate MeSH-based results – such as the ones provided above
- JAR build: include USI in your Java project as a library
SML – Semantic Measures Library
The Semantic Measures Library and Toolkit are robust open source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computation and analysis of semantic similarities, proximities or distances between terms or concepts defined in knowledge representations, e.g., structured vocabularies, taxonomies, RDF graphs. The comparison of instances (e.g., documents, patient records, genes) annotated by concepts is also supported. An important aspect of these new solutions is that they are generic and are therefore not tailored to a specic application context. They can thus be used with various controlled vocabularies and knowledge representation languages (e.g. OBO, RDF, OWL). The
project targets both designers and practitioners of semantic measures providing a JAVA source code library, as well as a command-line toolkit which can be used on personal computers or computer clusters.
The library implements a large collection of state-of-the-art measures and several parametric measures provide fine-grained tuning capabilities for specic usage contexts. The Semantic Measures Library and Toolkit aim at equipping communities studying and using semantic measures with robust, reliable and efficient, open source, generic and easy to use tools dedicated to semantic measures. Downloads, documentations, updates and community support are available at http://www.semantic-measures-library.org
The Semantic Measures Library and Toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies.
Sebastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain. Oxford Bioinformatics 2013.
COMMET – COMMunity Evaluation and Transformation
COMMunity Evaluation and Transformation is the name of an detection algorithm based on the method of Louvain for finding partitioned and overlapping communities in bipartite directed and unipartite graphs.
Detection of communities from observation can help identify trends, make recommendations, offer new services, or facilitate communication. But the issues are not limited to the social sphere, it can be extended to the political sphere with the evolution of the partitioning of voters according to the evolution of the collection of voting intentions. Some other fields of application could be:
- marketing,
- restructuring of services,
- reorganization of entities,
- commercial,
- personalized dissemination,
- common interests center,
- brain connection path,
- etc.
OBIRS – Ontology Based Information Retrieval System
To take advantages of knowledge models (ontologies) information retrieval systems may use the relationships between concepts to extend or reformulate queries. Our ontology based information retrieval system (OBIRS) relies on a domain ontology and on resources that are indexed using its concepts (e.g. genes annotated by concepts of the Gene Ontology or PubMed articles annotated using the MeSH, Medical Subject Headings). To fully benefit of this system,
queries have to be expressed using concepts of the same ontology. OBIRS’ interface thus provides query formulation assistance through auto-completion and ontology browsing. It then estimates the overall relevance of each resource w.r.t. a given query. The retrieved resources are ordered according to their overall scores, so that the most relevant resources (indexed with the exact query concepts) are ranked higher than the least relevant ones (indexed with hypernyms or hyponyms of query concepts). We provide visual results thanks to pictogram displayed on an interactive semantic map.
Two versions have been developped:
- the first one is dedicated to genes that have been indexed with concepts from the Gene Ontology – http://www.ontotoolkit.mines-ales.fr/ObirsClient/
- the second one is dedicated to biomedical publications dealing with cancer – http://obirs.itcancer.mines-ales.fr/
User Centered and Ontology Based Information Retrieval System for Life Sciences
Mohameth-François Sy, Sylvie Ranwez, Jacky Montmain, Armelle Regnault, Michel Crampes, Vincent Ranwez.
In BMC Bioinformatics, 13(Suppl 1):S4, 2012
We also use it in an hybrid approach, that benefits from both lexical and ontological document description, and combines them in a software architecture dedicated to information retrieval and rendering in specific domains. Relevant documents are first identified via their conceptual indexing based on domain ontology, and then segmented to highlight text fragments that deal with users’ information needs.
How ontology based information retrieval systems may benefit from lexical text analysis. Sylvie Ranwez, Benjamin Duthil, Mohameth François Sy, Jacky Montmain, Patrick Augereau, Vincent Ranwez. In "New Trends of Research in Ontologies and Lexical Resources", chapter 11, pp. 209-230, Series: Theory and Applications of Natural Language Processing, Springer, February 2013.
Music Band Recommendation System
Many applications take advantage of both ontologies and the Linked Data paradigm to characterize various kinds of resources. To fully exploit this knowledge, measures are used to estimate the relatedness of resources regarding their semantic characterization. Such semantic measures, particularly useful for information retrieval in RDF knowledge bases, mainly focus on specific aspects of the semantic characterization (e.g. types) or only partially exploit the semantics expressed in the knowledge base. We proposed a framework for defining semantic measures in the aim of comparing instances defined within an RDF knowledge base and we demonstrated that it is particularly well suited to recommendation systems. An application has been developed dedicated to music band recommendation: http://www.lgi2p.ema.fr:8090/kid/tools/bandrec
Semantic Measures Based on RDF Projections: Application to Content-Based Recommendation Systems online
Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi and Jacky Montmain.
Proceedings of On the Move to Meaningful Internet Systems: OTM 2013 Conferences, ODBase 2013,
Lecture Notes in Computer Science, vol. 8185, Robert Meersman, Hervé Panetto, Tharam Dillon, Johann Eder,
Zohra Bellahsene, Norbert Ritter, Pieter Leenheer, and Deijing Dou eds, isbn : 978-3-642-41029-1,
Springer Berlin Heidelberg, pp. 606-615, Graz, Austria, September 10-12 2013.
Kalitmo
Involved in several projects that aim to assist several scientific communities in the management of their members, we developped collaborative platform:
- for the French ToxNuc program dedicated to the study of nuclear impact on environement, plants and human being (http://www.toxcea.org)
- for biomedical domain; within the AvieSan program, some ITMOs (thematic multiorganisms
institutes) also benefit from our experience in collective platforms (Cancer – https://itcancer.aviesan.fr/, Immunology Hematology and Pneumology – https://ihp.aviesan.fr/, ITS/Technology for Health – https://its.aviesan.fr/)
For the Cancer community we also developped Kalitmo, a set of additional tools that provide some vues on the community: people involved, geo-localisation of scientific teams, statistics regarding their activity (publication, pattents, etc.). http://itcancer.mines-ales.fr