funded PhD position on “Deep learning for texts and knowledge bases access.” The PhD will be co-supervised
by François-Paul Servant (Renault), Prof. Lynda Tamine and Dr. Jose Moreno.
The thesis targets two main objectives:
1) the semantic representation of documents that mention entities from different external resources;
2) the categorization of documents by family of entities mentioned and the search of documents meeting entities’ needs.
To achieve these objectives, we plan to move towards an approach based on deep learning to solve both
representation and access problem (categorization, information retrieval), constrained by the content and structure
of multiple external resources (terminology, thesauri, knowledge graphs etc.).
From the point of view of representation, we are in line with recent works based on the joint regularization
of neural embeddings augmented by resources [Faruqui2014; Yu2014; Wang2014; Yamada2016]. This work is based on the hypothesis
that learned representations are interpretable if they are aligned with entities derived from resources so
that representations of entities obtained in latent space are all the closer as they are associated with
semantically related entities in the external resources. These representations extended to sentences, texts,
are exploitable in an information search task [Nguyen2018], in the identification of mentions of
entities [Moreno2017], or the categorization of short textual documents [Kim2014]. Although distributional
representations exist for words/texts, structured resources and their combinations, no work is
interested in the constrained regularization of multiple resources, nor in the multi-level structuring
of entities in these resources. One of the first works in this direction uses Poincaré geometry to represent
the hierarchies in resources[Nickel2017], but completely ignores the representation of relationships between entities.
However, relationships are omnipresent in today’s widely used knowledge bases, including those considered at Renault.
The thesis project faces new scientific challenges related to the definition of adequate neural architectures
and associated cost functions, capable of learning compositionality (semantic compositionality) both in
the local context (text) and global contexts (resources).
The envisioned starting date is September 2018 (starting in early 2019 is also possible).
We are looking for one candidate with a strong focus on information retrieval/NLP and machine learning
with the following profile:
+ Good Master’s degree in Computer Science, Statistics, Mathematics or related disciplines (essential)
+ Good programming skills in Python/Keras|TensorFlow|Torch (essential)
+ Advanced knowledge in algorithms and data structures (optional)
+ Ability to work independently and be self-motivated (essential)
+ Excellent communication skills in English (essential – minumil score of 750 in TOEIC)
The application should consist of the following:
+ a curriculum vitae
+ transcript of marks according to M1-M2 profile or last 3 years of engineering school (with indication on the ranking if possible)
+ covering letter
+ letter(s) of recommendation including at least one letter drawn up by a university referent
Potential candidates will be invited for an interview with the supervisors.
The application file should be sent
Conditions of employment
You will be hired on fixed-term contract (3 years contract – CIFRE) at Renault, a world leader in car manufacturing.
Working at Toulouse (IRIT/Renault)
You will be integrated in two teams with academic and industrial profiles: the IRIS team of IRIT recognized for its research
activities in the field of information retrieval and information synthesis with a focus on the use of Deep Learning technologies and the team Renault.
The IRIT lab represents one of the major potential of the French research in computer science, with a workforce of more than
700 members including 272 researchers and teachers 204 PhD students, 50 post-doc and researchers under contract and also 32
engineers and administrative employees.
Toulouse is located on the banks of the Garonne River, 150 kilometres from the Mediterranean Sea, 230 km from the Atlantic Ocean
and 680 km from Paris. It is the fourth-largest metro area in France, with 1,312,304 inhabitants as of January 2014.
Toulouse is the centre of the European aerospace industry, with the headquarters of Airbus, the Galileo positioning system, the SPOT
satellite system, ATR and the Aerospace Valley. It also hosts the European headquarters of Intel and CNES’s Toulouse
Space Centre (CST), the largest space centre in Europe. Thales Alenia Space, and Astrium Satellites also have a significant presence in Toulouse.
The University of Toulouse is one of the oldest in Europe (founded in 1229) and, with more than 103,000 students, it is the
fourth-largest university campus in France, after the universities of Paris, Lyon and Lille.
The city was the capital of the Visigothic Kingdom in the 5th century and the capital of the province of Languedoc in the
Late Middle Ages and early modern period, making it the unofficial capital of the cultural region of Occitania (Southern France).
[Faruqui2014] Faruqui M., Dodge J., Jauhar S. K., Dyer C., Hovy E., Smith N. A. Retrofitting Word Vectors to Semantic Lexicons, NAACL, 2014.
[Moreno2017] Moreno, J. G., Besançon, R., Beaumont, R., D’hondt, E., Ligozat, A. L., Rosset, S., Grau, B. (2017, Combining word and entity embeddings for entity linking. In Extended Semantic Web Conference (ESWC) pp. 337-352, 2017.
[Nickel2017] Nickel, M., & Kiela, D. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems (pp. 6341-6350), 2017.
[Nguyen2018] Gia Nguyen, Lynda Tamine, Laure Soulier, Nathalie Souf, A Tri-Partite Neural Document Language Model for Semantic Information Retrieval. In Extended Semantic Web Conference (ESWC), 2018.
[Yu2014] Yu M., Dredze M. Improving Lexical Embeddings with Semantic Knowledge, ACL, p. 545- 550, 2014.
[Wang2014] Wang Z., Zhang J., Feng J., Chen Z., « Knowledge Graph and Text Jointly Embedding », EMNLP, p. 1591- 1601, 2014
[Yamada2016] Yamada, I., Shindo, H., Takeda, H., Takefuji, Y., « Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation », CoNLL, p. 250-259, 2016