Notebooks

In this section, we will present the differents notebooks we made. The notebooks are mainly dedicated to the realisation of analysis.

SQLite database

First, to keep track of our SPARQL queries and perform them again, we used an SQLite database. It is very light and easy to use. This notebook shows how to create it. To read the database more easily, you can use the database administration tool DBeaver (link to download).

Properties analysis

The analysis is the first step necessary to the statistical study. It allows assessing the properties interesting. The level of interest is duple. The properties must be present in a sufficient proportion. And the individuals must share enough properties to study them together.

The entire analysis is available on this notebook.

Nationalities analysis

The nationalities study is realized with the "country of citizenship" (P27) on Wikidata. The main problem is these countries evolve in time (boundaries change, it can have regime changes). Their number is very important.Therefore, it is hard to study them. The solution is to aggregate the countries by continent or region. We make bivariate analyses with gender and profession. Also, we study evolution in time.

The entire analysis is available on this notebook.

Occupations analysis

In the same way, it is possible to produce an occupations' analysis. The occupations are the other occupations that "economist" and "jurist". It is possible to the differences between the both. It is also possible to see the evolution in time and the gender differences.

The entire analysis is available on this notebook.

Multiple Correspondance Analysis

The multiple correspondence analysis (MCA) is a method allowing to study of several categorical variables at the same time. This is possible by encoding variables as quantitative variables. This is to say each modality is transformed as binary data. Then, we can see the proximities between the modalities and the proximities between the individuals.

To realise the MCA with Python, this notebook is available.

Analysis with maps

Another method to study individuals are the maps. It is feasible with Python. We use two methods. The first one use the geographical coordinates. It puts on a map the individuals with the same geographical coordinates. It is possible to add interactivity by displaying maps on an HTML page. The other method uses polygons (geographic shapes with several variable vertices). With these polygons, we can draw the shapes of the continents and display them on a map.

To realise the maps with Python, this notebook is available.

Network analysis

The last method is the network analysis. In our study, the network allow to see links between individuals and between universities. A link between two individuals is created if they was in the same higher education facility at the same period. And two universities are linked is a individual has been in these two establishments.

In this network analysis, we needed to slice to notebooks in two files (the size authorized by github would be exceeded). In the first part, you will find how create the relations by a SQLite database. And in the second part, there is the network analysis with the networkx library.