Converting

We provide convenience functions to read a bb_binary Repository to NumPy Arrays and DataFrames. Or to create a Repository from existing data via Pandas Dataframes.

You might need them if you want to

  • create bb_binary Repositories from existing data (like ground truth data)
  • take a quick glance at a subset of the data (one that fits into memory)
  • experiment with new features

Warning

The convenience functions are not designed for performance. When working with huge datasets it is recommended to use the Repository.

Convert bb_binary to NumPy Array

To generate NumPy Arrays or Pandas DataFrames we provide a convenience function. Here is an example how to read all the frames and detection data to NumPy:

import numpy as np
import pandas as pd
from bb_binary import Repository, convert_frame_to_numpy

repo = Repository("some/path/to/a/repo")

arr = None
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame)
    arr = tmp if arr is None else np.hstack((arr, tmp))

Sometimes we also need fields from the FrameContainer. You can add those fields using the add_cols argument. This works for every other singular values or lists:

arr = None
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame, add_cols={'camId': fc.camId})
    arr = tmp if arr is None else np.hstack((arr, tmp))

It is also possible to restrict the output to a set of fields that should be extracted. When using the keys argument you need to specify detectionsUnion as Frame key when you want to extract detections:

arr = None
frame_keys = ('frameId', 'frameIdx', 'timedelta', 'timestamp', 'dataSourceIdx')
detection_keys = ('idx', 'xpos', 'ypos')
keys = frame_keys + detection_keys
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame, keys=keys + ('detectionsUnion',))
    arr = tmp if arr is None else np.hstack((arr, tmp))

Convert bb_binary to Pandas DataFrame

Usually you could directly create a Pandas DataFrame from a NumPy Array:

data = pd.DataFrame(arr)

Assuming that we have standard pipeline output with DetectionDP you have to convert list like fields separately (because Pandas has problems with lists in fields):

list_like_fields = set(['decodedId', 'descriptor'])
data = pd.DataFrame(arr[list(set(arr.dtype.fields.keys()) - list_like_fields)])
for field in list_like_fields:
    data[field] = pd.Series([list(list_field) for list_field in arr[field]])

Convert a Pandas DataFrame to bb_binary

When you have data from other sources like ground truth data, or you need to generate a Repository for testing purposes or feature evaluation you might need this converting function. All the column names in the Pandas DataFrame are matched to field names. You have to specify the detectionUnion type and also the camera id, because each FrameContainer is specific for a camera.

The frame_offset is used to generate unique Frame ids:

from bb_binary import Repository, build_frame_container_from_df
cam_ids = (0, 2)
offset = 0
for cid in cam_ids:
    fc, offset = build_frame_container_from_df(df, 'detectionsTruth', cid, frame_offset=offset)
    repo.add(fc)

Function Documentation

build_frame_container(from_ts, to_ts, cam_id, hive_id=None, transformation_matrix=None, data_source_fname=None, video_preview_fname=None)[source]

Builds a FrameContainer

Parameters:
  • from_ts (int or float) – Timestamp of the first frame
  • to_ts (int or float) – Timestamp of the last frame
  • cam_id (int) – id of camera
Keyword Arguments:
 
  • hive_id (Optional int) – id of the hive
  • transformation_matrix (Optional iterable with floats) – Transformation matrix for coordinates
  • data_source_fname (Optional str or list of str) – Filename(s) of the data source(s).
  • video_preview_fname (Optional str or list of str) – Filename(s) of preview videos. Have to allign to data_source_fname!
build_frame_container_from_df(dfr, union_type, cam_id, frame_offset=0)[source]

Builds a FrameContainer from a Pandas DataFrame.

Operates differently from build_frame_container() because it will be used in a different context where we have access to more data.

Column names are matched to Frame and Detection* attributes. Set additional FrameContainer attributes like hiveId in the return value.

Parameters:
  • dfr (pd.DataFrame) – Pandas dataframe with detection data
  • union_type (str) – the type of detections e.g. detectionsTruth
  • cam_id (int) – id of camera, also used as FrameContainer id
Keyword Arguments:
 

offset (frame) – offset for unique frame ids

Returns:

tuple containing:

  • frame container (FrameContainer): converted data from dfr
  • new offset (int): number of frames (could be used as frame_offset)

Return type:

tuple

convert_frame_to_numpy(frame, keys=None, add_cols=None)[source]

Returns the frame data and detections as a numpy array from the frame.

Note

The frame id is identified in the array as frameId instead of id!

Parameters:

frame (Frame) – datastructure with frame data from capnp.

Keyword Arguments:
 
  • keys (Optional iterable of str) – only these keys are converted to the np array.
  • add_cols (Optional dict) – additional columns for the np array, use either a single value or a sequence of correct length.
Returns:

a structured numpy array with frame and detection data.

Return type:

numpy array (np.array)