About Data Cleaning

Data cleaning aims to resolve negative load issues, for example, the number of boardings (Ons) does not match the number of alightings (Offs). It makes minor adjustments to data that show the highest passenger activity.

These points are where APC equipment commonly fail the most, and therefore, typically are the sources of count errors.

The segment of a block between zero load points is called a sub-block. A sub-block may be the entire block itself or a single block may consist of several sub-blocks. For example, sub-blocks could exist within a block between deadheads or layovers or for the entire block at pull-in and pull-out.

By definition, the last stop of a block should be a zero load point (ZLP) where the only person on the vehicle should be the driver, therefore, the load should be balanced at the end of a block.

Unfortunately, the same rule does not hold true at the trip level. Despite NTD reporting obligations at the trip level, it is common for a passenger to board near the end of a trip and remain on board for the return trip. Thus, for a data cleaning routine to be statistically valid for NTD reporting, it must consider ZLPs at the block level such that trip level reporting remains consistent.

Once the data is balanced at the block or sub block level, the balanced data can be ascribed to each trip within the block and used for NTD reporting.

The data cleaning algorithm is applied to the data between ZLPs, for example, pull-out from and pull-in to a garage, start of an interline deadhead.