Video Analysis

Traffic Density Estimation

Dublin is a very busy city with severe problems in high traffic load. Currently DCC has dedicated operators for manually finding and monitoring traffic incidents and traffic anomalies. To improve reliability of human visual control over large number of cameras it is very desirable to automatically direct operator attention to particular dangerous situations. Therefore, within the VaVeL project we developed a method and system that automatically detects high traffic load based on CCTV footage analysis and allows generating alerts to the Traffic Management Center in cases of incidents or congestion.
Designing such a system is a not a trivial task due to various challenges and system constraints. Generally, computationally intensive methods are not preferred due to high hardware costs and limited server space on-site.Camera vibration (due to wind, passing traffic etc.) and possible manual changes in the camera home position points to the necessity of image registration (see Figure 1). Very low frame rate (1 image per ten seconds) makes alignment of images being far from straightforward.Low resolution of images (720x576) is another limiting factor that made explicit car counting not reliable. Other factors, such as significantly different camera fields of view and environmental conditions (raining, obscuring cameras, changes in day-night lighting conditions) make building a reliable system a challenging task. Last but not least, car lights and shadows made our problem even more complex.

Figure 1: Misalignment between consecutive frames

To cope with those challenges we had to develop a specific method for traffic density estimation that works with images received from DCC cameras. Thereby, we intentionally do not explicitly count the amount of cars, which is essentially a more complex task. Instead, we estimate the traffic density as an occupancy measure of the road. Using such a measure of road occupancy allows detection of traffic jams and anomalous traffic situations under very difficult conditions as well as with low processing power.Fig. 2 shows an example of the traffic density estimated for a particular camera during 1.5 days with a largest peak in the morning around 8:30 o‘clock.

Figure 2: Traffic density estimated over 1.5 days for the lane shown in the right. Traffic density reaches its maximum in the morning around 8.30-8.50, which corresponds to the image on the right

Our approach is based on classification of pixels belonging either to moving objects (foreground) or static scene in the image (background). To overcome misalignments between subsequent frames image registration is performed. The measure of traffic density is then defined as a ratio of the number of detected foreground pixels in a particular lane to the number of pixels comprising the lane. Since areas closer to the camera take larger number of pixels, we are weighting pixel contribution such that pixels corresponding to closer objects have lower weights. To make such computations, masks of traffic lanes should be created for each camera. A few examples of detected foreground pixels, (marked in magenta) and estimated densities (in red) for each lane are shown in Fig.3.

Figure 3: Automatic trafic density estimation for two DCC cameras. Estimated levels of traffic density are shown in red. Pixels that were classified beloning to moving objects are shown in magenta.

Below we summarize sequential steps with more details.

Calibration
a. Definition of street segments (lanes) for which traffic density should be estimated. The definition of lanes is done by manual creation of binary masks for each camera.
b. Calculation of the scaling factor used to weight pixels corresponding to close and distance object locations. This is done by calculating the ratio of area covered by the car nearest to the camera and area covered by the car at the end of street segment.
Image registration
Each received image is registered with the previous image using image registration method based on Harris corner matching under an affine transformation constraint.
Detection of foreground pixels
Foreground pixel at a particular location, which belongs to a moving object, has varying intensity over subsequent frames. We identify foreground pixels as pixels at locations with large enough values of absolute difference between gray values of a few subsequent frames. We use absolute differences between current frame and a few adjacent frames in order to be more robust to changing light conditions.
Refinement of falsely detected pixels
To avoid false detections due to car lights and shadows that may occupy large regions we ignore detected foreground pixels if standard deviation of gray values in the region around this pixel is small enough. We also discard foreground pixels if they are too bright and are likely to belong to car lights.
Estimation of traffic density
The measure of traffic density is defined as a ratio of the number of detected foreground pixels in a particular lane to the number of pixels comprising the lane. Since areas that are closer to the camera take larger number of pixels, we are weighting pixel contribution such that pixels corresponding to closer objects have lower weights.

In order to evaluate performance of our method we conveyed an experiment based on 4 cameras recording images during four subsequent days. Ground truth traffic densities were manually estimated. The accuracy in detection of lane density reached 98.6%. In this experiment we excluded frames that were heavily distorted by various factors, e.g. when lens was occluded by raindrops.
Majority of failure cases were due to car shadows, car lights, and bad image quality due to environmental conditions. In the future we plan to work on reducing the sensitivity to car shadows and lights as well as to avoid manual preparation of lane masks, e.g. by explicitly detecting and counting the number of cars.

Traffic Density Estimation

We are investigating different approaches that can provide feasible solutions for the detection of emergency vehicles, such as firetrucks, police cars or an ambulances. Detection and localization of emergency cars may allow us to promptly direct attention of operators to probable locations of traffic incidents.

Currently we are testing an appearance-based approach that relies on visual features such as car colors, texture, and shape, which may allow distinguishing emergency vehicles from other cars. This is a very challenging problem due to changing weather conditions, low quality of the DCC images, varying camera perspectives, and insufficient resolution. Cars in 720x576 size DCC images are smaller than 60x60 pixels and are frequently obscured by various objects (see an example in Figure 1). Another severe problem, when using learning based methods, is absence of examples of emergency cars captured by DCC cameras. Moreover, the actual number of emergency cars that potentially can be found in the DCC data streams may be very small. Further note that the appearance-based is applicable under day light conditions only.

Figure 1 DCC camera image and a cropped patch that contains a vehicle.

To this end, we use deep neural networks that are capable of transferring knowledge learned from different but related data. Since we cannot even test performance on DCC images because we do not have labeled emergency cars, we use emergency cars captured by street cameras installed in London and are publically available (see an example in Figure 2).

Figure 2, London camera and a cropped patch that contains an emergency car vehicle

Below we describe our approach and report our preliminary results. Our system can be divided into two parts, namely, the localization of moving objects (background subtraction or change detection) and subsequent classification of the localized object. For the first part, instead of measuring the change in grey values, we measure the change in spatial variation of gray values. This reduces false detections in large smooth areas when illumination is varying. Change in spatial variation of gray levels is computed as standard deviation within a sliding window. Minimum of a few previous frames is then taken as a background reference image. Even if a single frame in a sequence contains parts of an empty road, these parts will be taken as a background. The background reference is then subtracted from the current frame. The pixel value differences above the pre-defined threshold are then binarized and bounding boxes around binary connected components are determined.

We also tested a few alternative approaches that produce similar performance. It seems that the first stage does not present a critical problem.In contrast, in the second stage we should cope with significantly more difficult problem.

To cope with the absence of examples of emergency cars captured by DCC cameras, we are using a pre-trained neural network and fine tune it using publicly available non-emergency and emergency cars found with google search engine. Note that the pre-trained network was trained on more than million images in order to classify 1000 categories. Among these categories there were fire trucks, police vans, and ambulances. On the other hand, DCC images have essentially lower quality, they were taken from essentially different perspective and are of lower resolution. Since DCC images are of significantly low resolution, we subsample the images before performing fine-tuning.
For performance evaluation we use test images captured by London street cameras, which are similar to Dublin city images. Experimenting with different pre-trained networks and hyper parameters used during fine-tuning we were able to achieve accuracy of binary classification (emergency versus non-emergency cars) better than 80%. Figure 3 shows the learning curve of the best performing network.

Figure 3. Neural network learning curve over 800 epochs of fine-tuning. He curve shows emergency car classification accuracy computed on the test dataset with London images.

In future, in addition to the appearance-based approach, we may consider an alternative approach that captures blinking blue lighting on tracked cars.

Traffic Density Estimation

Traffic Density Estimation

Acknowledgement