Vincent Schuler | Portfolio

Welcome to my Portfolio !

Explore a selection of 20 projects showcasing my skills in Data Science.

Disclaimer: For confidentiality and privacy reasons, all client data has been anonymized and any identifying details have been altered.

Data Visualization & Analysis

Nice visualizations are crucial to convey clear insights and support data-driven decisions.

They should be as attractive and easy-to-read as possible.

What I offer

Rigorous data manipulation and analysis, sharpened though hundreds of EDA (Exploratory Data Analysis) initiatives;
Expertise in extracting meaningful information, identifying key patterns, managing outliers, working with large datasets;
Memorable visualizations.

Technologies : matplotlib · seaborn · plotly · flowchart · networkx · Excel · VBA

Data Visualization Examples

All original projects are detailed later in my portfolio.

Customer Flows between Car Brands

Interactive Chord Diagram

Vincent Schuler ©

Las Vegas

Dubaï

City Expansions - 1984 to 2022

Image Sliders

Slide to explore !

Dubaï

Las Vegas

Scoring

Scoring is used to determine how likely a customer is to do something : buying a product, cancel a contract ...
Scores are created using machine learning models which learn from past data.
Instead of random and arbitrary outreach, targeting high-scoring customers garantees to improve results.

What I offer

Strong expertise in problem modeling from scratch (data gathering; feature engineering; creation of appropriate cross-validation strategies; advanced design and fine-tuning of ML models; creation of customed metrics fit to specific projects);
Proven experience in building and deploying scoring models to solve business challenges;
Expertise in optimizing marketing strategies through data-driven approaches.

Technologies : Scikit-Learn · LightGBM · XGBoost · CatBoost · Big Data · Vertica · Pandas · Polars

Project Examples

Click for more details !

Time Series Forecasting

Time series forecasting is used to predict future values based on previously observed data over time.

It enables businesses to anticipate changes in demand, sales or other key metrics.

What I offer

Extensive experience in developing and deploying time series forecasting models tailored to specific business needs;
Strong skills in data analysis, data processing, feature selection, underlying patterns identification, CV-strategies creation;
Proficiency in applying advanced algorithms and ensemble methods to improve reliability of forecasts;
Expertise in integrating forecasts into business processes to drive informed decision-making and resource allocation.

Technologies : Scikit-Learn · LightGBM · XGBoost · CatBoost · Pandas · Polars

Project Examples

Click for more details !

Event Forecasting

Event forecasting predicts the occurrence of events such as product launches, equipment failures or promotional impacts.

It enables companies to anticipate important milestones and proactively adjust their strategies.

What I offer

Proven ability to build models that capture temporal patterns, rare events and external influences;
Expertise in time-to-event modeling, tailored to forecast both regular and unpredictable events;
Strong skills in integrating external data sources (e.g., weather, market trends) to enhance event forecasting accuracy;
Proven success in deploying models that support preventive maintenance, sales optimization and risk mitigation.

Technologies : Scikit-Learn · LightGBM · XGBoost · CatBoost · Pandas · Polars

Project Examples

Click for more details !

Fuzzy Matching

Fuzzy matching is the art of matching objects that are not exactly identical, often because of human errors or inconsistent formats.

What I offer

Cutting-edge techniques tailored to meet unique business needs, including the ability to handle very large datasets.
Thorough transformation of messy and inconsistent data into well-structured formats.

Project Examples

Click for more details !

NLP

The goal of Natural Language Processing (NLP) is to enable machines to understand, interpret and generate human language to automate tasks like sentiment analysis, text classification, text summarization or conversational AI.

What I offer

Expertise in building and fine-tuning NLP models for tasks such as text classification, named entity recognition, and machine translation;
Experience in handling large-scale unstructured data, including preprocessing, tokenization and feature extraction;
Ability to integrate NLP solutions into various business applications, such as customer support systems, content recommendation and opinion mining.

Technologies : TF-IDF · Transformers · DeBERTa · NLTK · TensorFlow · PyTorch · Pandas · Polars

Project Examples

Click for more details !

Fraud Detection

Fraud detection involves identifying fraudulent activities by analyzing patterns in user behaviors, financial transactions or digital footprints.

What I offer

In-depth experience developing robust fraud detection models using anomaly detection, supervised learning and unsupervised techniques;
Proficient in building scalable systems to detect fraud patterns in high-dimensional datasets and real-time environments;
Expertise in balancing false positives and negatives through advanced modeling strategies, ensuring optimal fraud detection rates.

Technologies : Scikit-Learn · LightGBM · XGBoost · CatBoost · Pandas · Polars · Clustering · Isolation Forest · Autoencoders

Project Examples

Click for more details !

Recommendation Systems

The purpose of recommendation systems is to improve personnalisation and boosting engagement. By analyzing user behaviors and preferences, they help businesses deliver tailored experiences, driving customer satisfaction and increasing conversion rates.

What I offer

Expertise in building recommendation systems using collaborative filtering, content-based methods and hybrid approaches;
Strong skills in data preprocessing, feature engineering, and model optimization to improve recommendation relevance;
Experience with personalized recommendation engines for e-commerce and marketing campaigns;

Technologies : Scikit-Learn · LightGBM · XGBoost · CatBoost · Pandas · Polars

Project Examples

Click for more details !

Detailed Project Example : Car Recommendations

Customer Flows between Car Brands.

Empowering Predictions with Personalized Inputs

The solution was designed to make recommendations when no customer information was available. However, a key challenge was enabling it to incorporate customer preferences when provided, such as budget or desired vehicle type (e.g., family car, city car, 4x4).

Incorporating this additional information significantly enhanced the relevance of the predictions, as demonstrated below.

This is Lea.

She is searching for a new vehicle.

Project Overview

The goal of this project was to help people find their next car by predicting which vehicles might best suit them among 100 brands and 1500 models.

To achieve this, I developed a ML model using :

Individual profiles,
Previous car ownership history,
Key criteria, when available, such as budget and specific requirements (e.g., car brand; budget; family car).

The model was trained on a dataset I created from client data, which included millions of car renewal records along with detailed, relevant information about the owners.

On the right, a flowchart provides a high-level view of this renewal process.

Gather data to make predictions

Personnal Information

Information about last vehicle

Use these information to recommend the 3 most relevant vehicles among 1500 models.

Peugeot 3008

BMW X1

Renault Kadjar

Case n°1 : Blind predictions.

Lea has no idea what she's looking for.

→ In this scenario, people buy one of the 3 recommendations 13% of the time.

Case n°2 : Oriented predictions.

Lea mentionned she liked utility vehicles and vans.

→ In this scenario, people buy one of the 3 recommendations 62% of the time.

Citroën Berlingo

Renault Kangoo

Peugeot Partner

Fiat Fiorino

Fiat Doblo

Fiat Scudo

Case n°3 : Guided predictions.

Lea mentionned she wanted a Fiat utility vehicle.

→ In this scenario, people buy one of the 3 recommendations 98% of the time.

Car Clustering

Another interesting aspect of the solution was its capability to analyze and quantify the similarities among different vehicles, without any characteristic but their names.

This enabled us to create car clusters among 1500 models of 100 different brands:

Fiat Doblo

Peugeot Partner

Renault Kangoo

Citroën Berlingo

Example of cluster n°1 : Vans.

Peugeot 3008

Volkswagen T-Roc

Volkswagen Tiguan

BMW X1

Audi Q3

Example of cluster n°2 : Compact SUV.

Computer Vision

Computer vision enables machines to interpret visual information like images or videos. This technology has vast applications such as object detection or real-time object counting on video streams.

What I offer

Experience in developing computer vision solutions from the ground up: data collection and preprocessing; image augmentation; model design and architecture selection; training and fine-tuning of deep learning models; and robust evaluation metrics tailored to specific tasks;

Technologies : OpenCV · TensorFlow · Keras · PyTorch · YOLO · EfficientNet & ResNet · Pytesseract · NumPy · Albumentations

Project Examples

Click for more details !

Detailed Computer Vision Project - Example

Real-Time People Blurring

The goal of this project was to automatically detect and blur people in live-stream videos, allowing my client to store the footage while respecting privacy regulations.

To achieve this, I employed YOLOv8 and optimized parts of the model to improve processing speed.

Result Example - Security Cameras

Optimization

Optimization focuses on improving processes by maximizing or minimizing key metrics such as cost, time or efficiency.

What I offer

Expertise in formulating and solving complex optimization problems using programming and heuristic methods;
Proficiency in applying multi-objective optimization techniques to solve real-world business challenges like scheduling, routing and resource allocation;
Strong skills in customizing optimization algorithms to specific business requirements for improved operational efficiency.

Optimization Project - Example n°1

Optimizing City Allocation among Transporters

Problem Overview

The challenge was to allocate 350 German cities to 15 transporters while ensuring that :

The total distance is minimized;
The distribution is balanced, meaning that each transporter must :

Travel roughly the same distance as the others;
Visit roughly the same number of cities.

This problem is a more complex variant of the well-known "Traveling Salesman Problem" (TSP), which aims to find the shortest route through a set of cities. While TSP is simple to describe, finding an optimal solution becomes infeasible in reasonable time as the number of cities increases.

In our variant, the problem is much harder : it has multiple “salesmen” instead one just one, and several additional constraints. Known as the "Multi-Traveling Salesmen Problem Without Depot," this variant has not yet been solved theoretically, meaning there is no mathematically optimal solution that can be obtained within a practical timeframe. However, I developed a novel method that achieves very good results in reasonable time.

350 cities to allocate among 15 transporters.

Scattered area

Long distances between cities.

Dense area

Short distances between cities.

Solution Optimality

It's important to recognize that "optimality" depends on the constraints of the problem. Not all criteria can be satisfied at once, which necessitates trade-offs.

For instance, in densely populated areas where cities are close together, a transporter will travel only short distances to cover all cities. In more sparsely populated regions, transporters will have to travel much longer distances to connect the cities.

As a result, it is impossible to have every transporter visit the same number of cities while traveling the same distance.

Final Solution

My client established acceptable trade-off thresholds for the project:

Note: Distributing 350 cities among 15 transporters results in an average of ~27 cities per transporter.

After that, I developed a custom metric that integrated our 3 criteria: total distance traveled, disparities in individual distances and variations in the numbers of cities visited.

To tackle the problem, I customized the LKH-3 heuristic and combined it with clustering methods.

Additionally, I utilized the Google Directions API to generate a distance matrix based on realistic travel distances, rather than simple straight-line measurements.

Optimization Project - Example n°2

Optimizing Search Queries

This project aimed at generating search queries that, given a set of 50 patents, could retrieve them inside a bank of 13 000 000 patents (130 Go of textual data). The main challenge was to minimize the length of the queries, while ensuring all patents were still gathered.

The purpose of the project was to help patent professionals leverage AI-powered search tools.

Let's say you have 50 patents.

You want to retrieve them (and only them) among 13,000,000 other patents with an optimized query, as small as possible.

Here are 2 examples of queries you could write to retrieve your patents :

Bad query

Find patents with "wood" and "beam" in their titles.

Query : title:(“wood" AND "beam")

Result : this query is too simplist. It would only retrieve 10 of your 50 patents, but also lots of undesired other patents.

Good query

Title contains the word “wood” and either “board” or “sculpted”
Abstract contains the word “beam” but not “plank”
Description contains the expression “low weight but high resistance”
Patent category is either “HL0101” or “AA84112”

Query : title:(“wood" AND (“board" OR “sculpted")) AND abstract:(“beam" AND NOT "plank") AND description:"low weight but high resistance" AND category:("HL0101" OR "AA84112")

Result : this query is good. It retrieve your 50 patents and only them, with no intruders. Besides, it is short and not over-complicated.

And that's the challenge : building such good queries !

My Solution

My solution generated candidates and evaluated them to select the best one. However, with patents, the possible word combinations were infinite. Therefore, it was mandatory to reduce the search space to a manageable number of high-potential candidates (typically a few million).

Final Result

The solution achieved a custom mean average precision score (MAP@50) exceeding 90%.

This means that in average, it could retrieve over 45/50 patents per query.

Optimization Project - Example n°3

Kaggle Optimization Competition 2022 - The Christmas Card Conundrum

This two-month competition was part of Kaggle's annual Christmas Optimization series.

With my team of 5 members, we secured 9th place out of 874 teams and delivered a solution that was very close to perfect.

Problem Overview

A robotic arm with eight links of lengths [64,32,16,8,4,2,1,1] must "print" each pixel of the image on the right.

The arm starts at the center of the image and can be moved by adjusting the angle of its segments. Every time the arm is reconfigured, a movement cost must be paid based on how many segments change direction. Additionally, there's a color cost, based on how much the color changes between two consecutive points. The goal is to find a sequence of movements that covers the entire image with minimal overall cost.

Image to draw (257x257 pixels).

Problem Solution

You can see that the arm moves naturally and draws pixels with similar colors to reduce the painting cost.

Solving this problem required advanced pathfinding and optimization techniques to minimize the arm’s movement and color costs. Key techniques were :

Expert use of LKH-3 heuristic.
Customization of cost functions and constraints to integrate the problem complexity, mainly related to the arm mouvements.

This solution achieved a cost of 74076, while the optimal cost was 74075. It's a 99.998% perfect score !

Educational Content

Throughout my career, I've created insightful educational experiences in AI, machine learning and Python.

My courses are designed to demystify complex concepts, empowering learners with practical and business-oriented knowledge to excel in the field of Data Science.

What I offer

Pedagogical & Tailored Workshops: Interactive sessions designed to address specific learning goals, ranging from foundational concepts to advanced techniques in AI and machine learning.
Online Courses and Corporate Training: Structured courses that blend theoretical knowledge with practical applications, enabling participants to gain hands-on experience in Python and data science methodologies.
Mentorship and Support: One-on-one mentorship to guide learners through their data science journey, providing personalized feedback and support to enhance their understanding and skills.