Knowable — Trustpilot Review Analysis Platform

Bachelor project · End-to-end data product · R, Shiny & NLP

Knowable challenges the idea that a simple 1–5 star rating on Trustpilot can fully represent customer experience. By scraping, analysing and visualising the written reviews behind the stars, the project explores how text analytics can support both companies and consumers in making more informed decisions.

R NLP & text mining Sentiment analysis Topic modelling Web scraping Shiny dashboard MySQL

Context & challenge

Online reviews are central to how people evaluate companies, yet star ratings compress complex experiences into a single number. My bachelor project asked: What do customers actually say in their reviews – and how does this align with the visible rating?

I designed Knowable as a platform concept that can extract, analyse and present Trustpilot reviews in a way that reveals patterns, themes and sentiment that are otherwise hidden behind the stars.

My role – Solo project

This was an individual bachelor project, where I was responsible for the entire solution from idea to implementation:

Formulated the research question and overall concept for Knowable.
Implemented the full data pipeline in R – from web scraping to modelling.
Set up a MySQL database and connected it to the Shiny app.
Built the Shiny UI and server logic for interactive exploration of results.
Conducted the analysis and wrote all methodological and analytical chapters.

Pipeline & methods

Data collection

I developed a custom R function to scrape Trustpilot company pages, capturing ratings, review text, dates, reply status and more. Reviews are stored in a MySQL database so analyses can be repeated and extended across companies.

Text processing & analysis

To understand what reviewers actually talk about, I combined several NLP techniques:

Sentiment analysis to measure the emotional tone of reviews and compare it to the official TrustScore.
Topic modelling to uncover recurring themes such as service, delivery or product quality.
Frequency, term co-occurrence and LexRank summarisation to find representative reviews and typical formulations.

Interface prototype

In Shiny, I designed an interactive dashboard where users can:

See overall sentiment as a gauge next to the star rating.
Explore word clouds, topics and “most common reviews”.
Switch between companies to compare review profiles.

Outcome & use cases

The Knowable prototype shows how companies and consumers can move beyond star scores and engage with the language behind them:

Companies can identify recurring complaints and strengths in their service.
They can compare themselves with competitors on themes and sentiment.
Consumers gain a clearer sense of what a “3-star experience” actually looks like in practice.

In the future, the concept could be extended with more review sources, trend tracking over time and exportable reports for internal quality work.

Reflections & learnings

Building Knowable end-to-end strengthened my ability to:

Translate a loosely defined problem into a concrete, data-driven product.
Bridge backend work (scraping, databases, modelling) with a clear, interpretable interface.
Reflect critically on data quality, sampling and modelling assumptions when drawing conclusions from user-generated content.