Back to Top

Infrastructure and Integration

VaVeL infrastructure deployed in Warsaw consists of three systems: (i) development and scientific environment installed in Warsaw University of Technology, (ii) testbed maintained by Orange Polska and production-class system used for ultimate system verification installed and maintained by the City of Warsaw. The following figure shows VaVeL development and integration process, i.e. how project products are designed, developed, tested and deployed. The system modules responsible e.g. for vehicles data processing and storing are developed and tested (unit tests) by programmers from WUT. Then components are  integrated and tested (integration tests oriented on interfaces) by system engineers in Orange Polska. After stability tests, the components are installed in production environment where they are tested from usability and performance point of view by CoW experts.

The Hadoop-based cluster named ‘bajorko’ installed in Orange is available to all consortium members. This system process large sets of data e.g. buses location (from 2 up 4,5 GBs daily) trams (1-2,5 GB every day), timetables and large sets of text information from Public Transport Authority: web page, RSS, Twitter account and non emergency City reporting system- 19115.

Within ‘bajorko’ cluster we store and process the data on various level simultaneously: raw data, cleaned, deduplicated or combined with other data sources (e.g. vehicles location  data combined with timetables). All data sets are available:

  • as plain CSV files in distributed (HDFS) storage
  • in Apache Hive distributed data warehouse
  • as real time Apache Kafka streams

Overall, the cluster processes up to 8 GBs of data per day.

Among many data sets describing various aspects of the city life, there is one of particular interests, namely location statistics derived from the PLMN (Public Land Mobile Network) within the boundaries of Warsaw city. Mobile subscriber’s location statistics contains information on the number of terminals (mobile phones, internet modems etc.) being active i.e. using voce or xMS communication. In order to preserve the privacy of the network users, the data are being anonymized and aggregated to the level of network cell, so no individual can be extracted back from the data. Estimation on transport demand in given area and time, or the number of people that are now in the given area , are just two example how this data can be used. Due to its nature, the data quality depend on the cells density and radio network technology.