About Checking and Reconciling Imbalanced Data
APC Gateway allows for the correction of small imbalances in the data after the sub-blocks are checked.
-
Data cleaning starts at the block level. Each block can be an entire block or block portions defined by Zero Load Points on TripTimes. (This is defined in a field in the TripTimes table.) If set, the point is treated as an "end point" and all riders are expected to leave the bus.
For clients requiring NTD counts (Rid/Import/Calculate NTD Counts property), data cleaning at the trip level is performed on each block that has been successfully cleaned.
Each block, sub block, and trip is checked individually.
- All board and alight counts are summarized to get the counted Ons and counted
Offs, respectively.
If |counted Ons - counted Offs| > 5 AND [|counted Ons - counted Offs|]/[counted Ons] >= maximal imbalance percentage allowed (This is set in the Rid/Import/Cleanse/Max Imbalance Percentage property.), the cleansed item is assumed to have bad/imbalanced data that cannot be "repaired". The item is rejected. If the item is not rejected, but the counted Ons are not equal to the counted Offs, the algorithm attempts to cleanse the data.
Cumulative (cum) Ons and Offs are adjusted separately to force their final value to be equal to the target total. The method of scaling cumulative Ons and Offs to meet the target total set as the default is linear. If there is, however, evidence that errors in counts are more or less proportional to the size of the count, scaling can be non-linear, that is, a power other than 1.
- The algorithm sets the total target:
For the first round of cleaning, if the Rid/Import/Cleanse/Load Balance Method is not Average, the target will be either lesser or bigger of counted Ons and counted Offs based on the property value (MIN or MAX). The same logic is applied if the Rid/Import/Cleanse/Apply Min or Max Balance to Whole Blocks Only property is true. Otherwise, the target will be a mean of counted Ons and counted Offs. If the mean is not an integer, the mean is divided by 2. It rounds the value, and multiplies it by 2 to avoid round up or round down bias.
- The algorithm then sets the On and Off factor to be applied to the
counted Ons and Offs to achieve the target value:
- On factor (f) = Target total ons / Total counted ons
- Off factor (f) = Target total offs/ Total counted offs
- Scaled cum ons (for each Stop) = on factor (f) * counted cum ons
- Scaled cum offs (for each Stop) = off factor (f) * counted cum offs
All scaled cumulative (cum) Ons and Offs are rounded to the nearest integer. The Ons and Offs by stop are calculated by taking the difference of the cumulative values at one stop to the cumulative values at the previous stop.
- Balanced ons = cum ons(i) – cum ons(i-1)
- Balanced offs = cum offs(i) – cum offs(i-1)
The total balanced Ons and Offs should be equal to the target value.
- The algorithm checks the resulting distribution of Ons and Offs for
negative departing and through loads. A Load Violation Point is anywhere there is a negative departing load or where the through load is less than -1.
-
If |most negative load| > |most negative through load| then the load violation point exists at the negative load point.
Else, the load violation exists at the negative through load point. If there is a tie between points, the earliest in the trip is chosen.
-
If a load violation point exists, the algorithm splits the sub block in question into two sub blocks:
- Ending at the split point and including the Offs at that point and sub block.
- Beginning at the split point and including the Ons at that point.
If any portion of a block fails to be cleaned, the entire block is rejected.
-
- The algorithm sets the total target: