Dieter De Witte

A CV of bookshelves..

I've been questioning the concept of a CV for a while. I see it as a possible window on a person's skills and experiences. Unfortunately CV's have to be short and CV screening is often done by nontechnical HR experts. For this reason I started to consider a CV as one possible static viewpoint. Where one angle might be interesting, another angle might be more relevant for a future colleague. But you can take it further, maybe considers you their mentor and this person might want to understand the road you took prior to taking up a role. In this blog post I'll write my CV from the point of view of my Goodreads book shelves: which (technical) books molded me into the engineer I am today? By explicitely linking to my Goodreads account, the book lists in this blog post will automatically be enriched with reviews over time. I've decided to flip the chronological order.

iMinds Data Science Lab

In 2015 I started working for iMinds Data Science Lab as an R&D consultant specialized in Big Data Science. My colleagues had a lot of experience in Semantic Web technology, so upon my arrival I studied the concepts behind this technology:

Linked Data: Data stored in a way to make the semantics explicit. This is accomplished by describing every data element as a triple of the form: subject predicate object. Eeach of these is represented by a unique deferencable uri. The predicates are often elements of an ontology, a public vocabulary. By re-using ontologies for different datasets we get Linked Data: by sharing vocabularies data is automatically interlinked.
Semantic Web: When linked data is published on the web it can be explored by intelligent agents in an automated fashion: since the semantics is encapsulated in the data the agent can reason upon this data.
RDF: The Resource Description Framework is a data format where data is stored as triples.
SPARQL: This is the RDF query language. It has quite a bit of similarities to the standard SQL language: since all RDF data consists of triples, a naive approach for storing RDF is by using a single SQL table with three columns.

The bookshelf below corresponds to the books I used to familiarize myself with these topics. The book on RDF Database Systems I used to initiate my personal research reviewing the state of the art in RDF Database engines.

RDF Database Systems: Triples Storage and SPARQL Query Processing

My main tool for data visualization and scripting has very long been Matlab. Unfortunately this cannot be used freely so suddenly I was forced to redevise my data processing pipeline in terms of tools. I made quite a radical decision at that moment: Python for everything! Python is far richer than R from an engineer's perspective. There are libraries for interacting with databases, data visualization, shell scripting, web scripting,... I decided that while it might be tough in the beginning, I would do everything with Python, more specifically I am very fond of the Jupyter project. Below are some of the books I used to familiarize myself with the language. Python for Data Analysis is the most important one for me. Apart from Python you might need some functional programming skills. I had a look at Scala but since there was no immediate project where I could learn it I had a look at Java 8, which has some support for functional programming to. This makes Java a reasonable choice to interact with Spark, if Python is a bottleneck.

Programming Collective Intelligence: Building Smart Web 2.0 Applications

Java 8 Lambdas: Pragmatic Functional Programming

The Docker Book: Containerization is the new virtualization

To quickly deploy applications a new virtualization technology quickly arose: Docker. This basically makes my development stack complete: I use docker for deploying a set of microservices or I containerize the installation of certain applications,... Very often it ends as a one-click fully controlled environment at your disposal, with way less overhead than the classical hypervisor (VM) technology.

Big Data Science course

From february 2016 on I was responsible for a new course at Ghent University: Big Data Science. The course covers 3 areas in data science:

Data Management: what tools can be used to manage your data? NoSQL, Big Data platforms (Hadoop & Spark), Semantic Web technology and Streaming Data systems are covered.
Data Mining/Analytics: the focus lies on scalable algorithms for NLP, machine learing, association rules, information retrieval, pageRank, recommender systems,...
Data Visualization: the focus lies on creating custom web-based interactive data visualizations using Javascript

Big Data Management: NoSQL, Big Data Platforms, Stream Processing

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement

Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Hadoop Application Architectures: Designing Real-World Big Data Applications

bla

Big Data Analytics: Data Mining, Machine Learning, ...

Artificial Intelligence: A Modern Approach

Pattern Recognition and Machine Learning

Social Network Analysis: Methods and Applications

Probabilistic Graphical Models: Principles and Techniques

bla

Big Data Visualization: D3.js, design principles

Interactive Data Visualization for the Web

The Visual Display of Quantitative Information

bla

An undergraduate in Applied Physics (2002-2006)

In hindsight I've been very lucky with respect to my talents and interests: I've been able to pick my favourite Master (Engineering Physics). The engineering master is highly regarded in the job market, it's a quality label which allows people to trust my abilities to learn. Or put in other words: They trust that I can be a valuable asset in fields for which I wasn't specifically trained.

Because my interests are generally quite broad - I just like to solve challenging problems - I could have enjoyed many specializations. Early on I was attracted by engineering in the field of medicine, which could be reached via a major in Electromechanical engineering. Unfortunately at the moment the curriculum content did not align well with my interests and and I retook the second year in the Applied Physics specialization. Prior to this choice I think the first two books on the shelf sparked my enthusiasm.

The book of Stephen Hawking I read while still being in secondary school, the book by Brian Greene is a lot more thorough and is probably one of the best accessible books on the big theories of physics: Relativity and Quantum Mechanics and how they can be unified using String Theory.

The Illustrated A Brief History of Time/The Universe in a Nutshell

The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory

Quantum Computation and Quantum Information

Superconductivity, Superfluids, and Condensates

During my years as an undergraduate at Ghent University I was very fascinated by theoretical works. I collected some standard works on fairly advanced topics. Most of them I didn't read from A to Z but all of them where covered in courses I took. As the list of topics suggest they align with some cutting-edge areas such as Superconductivity and Quantum Information Theory.

Towards graduation (2006-2008)

While approaching graduation I finally asked myself the question what options there were for me after graduation. During my first Master I took a course Computational Solutions to Wave Problems, which introduced some of the most important numerical algorithms to solve complicated physical problems. Siam's Top 10 algorithms of the 20th century were mostly covered in this course. Especially the Fast Multipole Algorithm and Fast Fourier Transforms and Krylov Subspace Methods proved very useful during my thesis dissertation where I tried to solve the wave problem is a Nuclear Fusion Reactor (Tokamak).

Fast And Efficient Algorithms In Computational Electromagnetics

The electromagnetic wave problem in a Tokamak reactor is used to model the interaction between an RF-antenna and a nuclear plasma. The RF-antenna is one of the most important tools to pump energy in the highly complicated plasma, which has to be heated up to 10 million Kelvin before it starts to produce clean(!) nuclear energy. Note that nuclear fusion is different from nuclear fission which is used in current nuclear reactors. Nuclear Fusion is in fact the process which produces the energy in the sun.

Performance is extremely important in the algorithms designed in this field, thus requiring a deep mathematical insight and strong programming skills. From the perspective of programming language this meant that we actually mainly used C++ and even relied om some Fortran libraries. Thinking in C++ by Bruce Eckel made me fall in love with programming. This book helped to choose between physics and computer science, and the latter was victorious.