A CV of bookshelves

A professional journey through my library

Posted by Dieter De Witte on May 22, 2016

A CV of bookshelves..

I've been questioning the concept of a CV for a while. I see it as a possible window on a person's skills and experiences. Unfortunately CV's have to be short and CV screening is often done by nontechnical HR experts. For this reason I started to consider a CV as one possible static viewpoint. Where one angle might be interesting, another angle might be more relevant for a future colleague. But you can take it further, maybe considers you their mentor and this person might want to understand the road you took prior to taking up a role. In this blog post I'll write my CV from the point of view of my Goodreads book shelves: which (technical) books molded me into the engineer I am today? By explicitely linking to my Goodreads account, the book lists in this blog post will automatically be enriched with reviews over time. I've decided to flip the chronological order.

iMinds Data Science Lab

In 2015 I started working for iMinds Data Science Lab as an R&D consultant specialized in Big Data Science. My colleagues had a lot of experience in Semantic Web technology, so upon my arrival I studied the concepts behind this technology:

  • Linked Data: Data stored in a way to make the semantics explicit. This is accomplished by describing every data element as a triple of the form: subject predicate object. Eeach of these is represented by a unique deferencable uri. The predicates are often elements of an ontology, a public vocabulary. By re-using ontologies for different datasets we get Linked Data: by sharing vocabularies data is automatically interlinked.
  • Semantic Web: When linked data is published on the web it can be explored by intelligent agents in an automated fashion: since the semantics is encapsulated in the data the agent can reason upon this data.
  • RDF: The Resource Description Framework is a data format where data is stored as triples.
  • SPARQL: This is the RDF query language. It has quite a bit of similarities to the standard SQL language: since all RDF data consists of triples, a naive approach for storing RDF is by using a single SQL table with three columns.
The bookshelf below corresponds to the books I used to familiarize myself with these topics. The book on RDF Database Systems I used to initiate my personal research reviewing the state of the art in RDF Database engines.

Programming the Semantic Web
Learning SPARQL
Linked Data
RDF Database Systems: Triples Storage and SPARQL Query Processing


My main tool for data visualization and scripting has very long been Matlab. Unfortunately this cannot be used freely so suddenly I was forced to redevise my data processing pipeline in terms of tools. I made quite a radical decision at that moment: Python for everything! Python is far richer than R from an engineer's perspective. There are libraries for interacting with databases, data visualization, shell scripting, web scripting,... I decided that while it might be tough in the beginning, I would do everything with Python, more specifically I am very fond of the Jupyter project. Below are some of the books I used to familiarize myself with the language. Python for Data Analysis is the most important one for me. Apart from Python you might need some functional programming skills. I had a look at Scala but since there was no immediate project where I could learn it I had a look at Java 8, which has some support for functional programming to. This makes Java a reasonable choice to interact with Spark, if Python is a bottleneck.

Python for Data Analysis
Python Machine Learning
Programming Collective Intelligence: Building Smart Web 2.0 Applications
Java 8 Lambdas: Pragmatic Functional Programming
The Docker Book: Containerization is the new virtualization


To quickly deploy applications a new virtualization technology quickly arose: Docker. This basically makes my development stack complete: I use docker for deploying a set of microservices or I containerize the installation of certain applications,... Very often it ends as a one-click fully controlled environment at your disposal, with way less overhead than the classical hypervisor (VM) technology.

Big Data Science course

From february 2016 on I was responsible for a new course at Ghent University: Big Data Science. The course covers 3 areas in data science:

  • Data Management: what tools can be used to manage your data? NoSQL, Big Data platforms (Hadoop & Spark), Semantic Web technology and Streaming Data systems are covered.
  • Data Mining/Analytics: the focus lies on scalable algorithms for NLP, machine learing, association rules, information retrieval, pageRank, recommender systems,...
  • Data Visualization: the focus lies on creating custom web-based interactive data visualizations using Javascript

Big Data Management: NoSQL, Big Data Platforms, Stream Processing

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
Making Sense of NoSQL
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
Learning Neo4j
Learning Cypher
Learning Spark
Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Hadoop Application Architectures: Designing Real-World Big Data Applications
Streaming Data


bla

Big Data Analytics: Data Mining, Machine Learning, ...

Mining of Massive Datasets
Introduction to Data Mining
Artificial Intelligence: A Modern Approach
Machine Learning
Pattern Recognition and Machine Learning
Forecasting: Principles and Practice
Social Network Analysis: Methods and Applications
Reinforcement Learning: An Introduction
Recommender Systems: An Introduction
Probabilistic Graphical Models: Principles and Techniques
A Field Guide to Genetic Programming


bla

Big Data Visualization: D3.js, design principles

Interactive Data Visualization for the Web
The Visual Display of Quantitative Information
Envisioning Information


bla

An undergraduate in Applied Physics (2002-2006)

In hindsight I've been very lucky with respect to my talents and interests: I've been able to pick my favourite Master (Engineering Physics). The engineering master is highly regarded in the job market, it's a quality label which allows people to trust my abilities to learn. Or put in other words: They trust that I can be a valuable asset in fields for which I wasn't specifically trained.

Because my interests are generally quite broad - I just like to solve challenging problems - I could have enjoyed many specializations. Early on I was attracted by engineering in the field of medicine, which could be reached via a major in Electromechanical engineering. Unfortunately at the moment the curriculum content did not align well with my interests and and I retook the second year in the Applied Physics specialization. Prior to this choice I think the first two books on the shelf sparked my enthusiasm.

The book of Stephen Hawking I read while still being in secondary school, the book by Brian Greene is a lot more thorough and is probably one of the best accessible books on the big theories of physics: Relativity and Quantum Mechanics and how they can be unified using String Theory.

The Illustrated A Brief History of Time/The Universe in a Nutshell
The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory
Quantum Mechanics
Modern Quantum Mechanics
Introducing Einstein's Relativity
Quantum Computation and Quantum Information
Nonlinear Waves, Solitons and Chaos
Equilibrium Statistical Physics
Superconductivity, Superfluids, and Condensates


During my years as an undergraduate at Ghent University I was very fascinated by theoretical works. I collected some standard works on fairly advanced topics. Most of them I didn't read from A to Z but all of them where covered in courses I took. As the list of topics suggest they align with some cutting-edge areas such as Superconductivity and Quantum Information Theory.

Towards graduation (2006-2008)

While approaching graduation I finally asked myself the question what options there were for me after graduation. During my first Master I took a course Computational Solutions to Wave Problems, which introduced some of the most important numerical algorithms to solve complicated physical problems. Siam's Top 10 algorithms of the 20th century were mostly covered in this course. Especially the Fast Multipole Algorithm and Fast Fourier Transforms and Krylov Subspace Methods proved very useful during my thesis dissertation where I tried to solve the wave problem is a Nuclear Fusion Reactor (Tokamak).

Fast And Efficient Algorithms In Computational Electromagnetics
Electromagnetic Fields
Waves in Plasmas
Plasma Waves, 2nd Edition
Thinking in C++


The electromagnetic wave problem in a Tokamak reactor is used to model the interaction between an RF-antenna and a nuclear plasma. The RF-antenna is one of the most important tools to pump energy in the highly complicated plasma, which has to be heated up to 10 million Kelvin before it starts to produce clean(!) nuclear energy. Note that nuclear fusion is different from nuclear fission which is used in current nuclear reactors. Nuclear Fusion is in fact the process which produces the energy in the sun.

Performance is extremely important in the algorithms designed in this field, thus requiring a deep mathematical insight and strong programming skills. From the perspective of programming language this meant that we actually mainly used C++ and even relied om some Fortran libraries. Thinking in C++ by Bruce Eckel made me fall in love with programming. This book helped to choose between physics and computer science, and the latter was victorious.

Parallel algorithms in Bioinformatics

From 2009 until 2014 I worked as a bioinformatics researcher for IBCN in Ghent.

  • Algorithms
  • Genetics
  • Statistics
  • An Introduction to Bioinformatics Algorithms
    Algorithms
    Introduction to Algorithms
    Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
    Mpi - The Complete Reference
    Hadoop: The Definitive Guide
    Hadoop in Practice


    bla

    Genetics For Dummies
    Molecular Biology of the Cell
    Statistics for Dummies
    Intermediate Statistics for Dummies


    bla

    Head First Design Patterns
    Head First HTML and CSS
    Head First PHP & MySQL


    bla

    Big Data Consultant @ Telenet

    2014 I worked as a Big Data consultant for Telenet. Pig, Hadoop, Hive, Impala, ELearning: Machine Learning, Statistics

    Big Data
    Impala in Action
    Mahout in Action
    Programming Pig
    Machine Learning in Action


    bla

    Banner from: photowall.co.uk.