Director of research at Thunken, consultant. Ph.D. in natural language processing. Text analytics, web crawling, data mining, data science, machine learning, citation analysis, etc. English/Français
Knowledge is grounded in data, and I work to bridge gaps between scientific research and business intelligence. I can help you aggregate citation and altmetrics data (a.k.a. bibliomining), turn the data into metrics, and then turn these numbers into actionable insights for your organization.
I can help you index tons of documents with Elasticsearch and tune your mapping for search speed.
I lead a team that builds and maintains search engines and text mining tools for various clients as well as for our own projects. Over the years, I have perfected the art of creating Elasticsearch mappings and configurations to index huge collections and gigaword/multilingual corpora with a minimal server infrastructure. I also have hands-on experience with advanced features like percolate queries or field collapsing.
Do you want to hire someone in France without opening a subsidiary or a branch? Then you'll have to register with URSSAF/CNFE and report payroll taxes with the TFE. Oh, and don't forget your employee's DPAE, their complémentaire, their prévoyance, and their mutuelle!
Using the TFE is a breeze, but registering your foreign company as an employer with URSSAF is a minefield of administrative paradoxes. Communicating with URSSAF as a foreign company can also be complex, especially if you're not used to our beloved French administration.
I was born and raised in France, and I've been using the TFE since 2017. I've had every problem you can imagine with their system, and I'd be happy to help other entrepreneurs understand what's expected when you hire someone in France without incorporating there.
I can help you normalize, summarize, and evaluate your textual data, whether your need to find patterns in unstructured data, deal with multilingualism, or resolve ambiguity. I can also help you with technical and strategic planning.
I have a Ph.D. in NLP, and 10+ years of hands-on experience with text processing applications. I have worked in the R&D departments of both startups and multinational corporations like Nuance.
Timeouts and robots.txt rules are not the biggest challenges. How do you transform run-of-the-mill web pages into a clean, machine-readable corpus? How do you deal with frequent structural changes without rewriting your code every other week? By the way, did you check the licensing terms of the content that you crawl?
I have written more crawlers and bots than I can count, and I have years of hands-on experience wrangling web pages into normalized documents that are ready to be analyzed by humans and machines alike.
Thank you Luc for sharing your experience and the honest guidance and clear advice. I look forward to further discussions. Thanks again!!
Very knowledgeable and helpful.
Luc helped me assess the feasibility of and big picture strategy for an NLP fintech project I've been thinking about for several years now. It was a tremendous help, and I'll be reaching out again.
Great answers - got us to where we needed to be quickly.
Great call with Luc, he explained the challenges and possible solutions very clearly and efficiently.
Luc understands his space well and was able to answer all the questions I had.
Thanks Luc! This was a great session. Very informative and I look forward to reaching out down the line.