Converting¶

We provide convenience functions to read a bb_binary Repository to NumPy Arrays and DataFrames. Or to create a Repository from existing data via Pandas Dataframes.

You might need them if you want to

create bb_binary Repositories from existing data (like ground truth data)

take a quick glance at a subset of the data (one that fits into memory)

experiment with new features

Warning

The convenience functions are not designed for performance. When working with huge datasets it is recommended to use the Repository.

Convert bb_binary to NumPy Array¶

To generate NumPy Arrays or Pandas DataFrames we provide a convenience function. Here is an example how to read all the frames and detection data to NumPy:

import numpy as np
import pandas as pd
from bb_binary import Repository, convert_frame_to_numpy

repo = Repository("some/path/to/a/repo")

arr = None
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame)
    arr = tmp if arr is None else np.hstack((arr, tmp))

Sometimes we also need fields from the FrameContainer. You can add those fields using the add_cols argument. This works for every other singular values or lists:

arr = None
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame, add_cols={'camId': fc.camId})
    arr = tmp if arr is None else np.hstack((arr, tmp))

It is also possible to restrict the output to a set of fields that should be extracted. When using the keys argument you need to specify detectionsUnion as Frame key when you want to extract detections:

arr = None
frame_keys = ('frameId', 'frameIdx', 'timedelta', 'timestamp', 'dataSourceIdx')
detection_keys = ('idx', 'xpos', 'ypos')
keys = frame_keys + detection_keys
for frame, fc in repo.iter_frames():
    tmp = convert_frame_to_numpy(frame, keys=keys + ('detectionsUnion',))
    arr = tmp if arr is None else np.hstack((arr, tmp))

Convert bb_binary to Pandas DataFrame¶

Usually you could directly create a Pandas DataFrame from a NumPy Array:

data = pd.DataFrame(arr)

Assuming that we have standard pipeline output with DetectionDP you have to convert list like fields separately (because Pandas has problems with lists in fields):

list_like_fields = set(['decodedId', 'descriptor'])
data = pd.DataFrame(arr[list(set(arr.dtype.fields.keys()) - list_like_fields)])
for field in list_like_fields:
    data[field] = pd.Series([list(list_field) for list_field in arr[field]])

Convert a Pandas DataFrame to bb_binary¶

When you have data from other sources like ground truth data, or you need to generate a Repository for testing purposes or feature evaluation you might need this converting function. All the column names in the Pandas DataFrame are matched to field names. You have to specify the detectionUnion type and also the camera id, because each FrameContainer is specific for a camera.

The frame_offset is used to generate unique Frame ids:

from bb_binary import Repository, build_frame_container_from_df
cam_ids = (0, 2)
offset = 0
for cid in cam_ids:
    fc, offset = build_frame_container_from_df(df, 'detectionsTruth', cid, frame_offset=offset)
    repo.add(fc)

Function Documentation¶

build_frame_container(from_ts, to_ts, cam_id, hive_id=None, transformation_matrix=None, data_source_fname=None, video_preview_fname=None)[source]¶

Builds a FrameContainer

Keyword Arguments:
Parameters:	from_ts (int or float) – Timestamp of the first frame to_ts (int or float) – Timestamp of the last frame cam_id (int) – id of camera
	hive_id (Optional int) – id of the hive transformation_matrix (Optional iterable with floats) – Transformation matrix for coordinates data_source_fname (Optional str or list of str) – Filename(s) of the data source(s). video_preview_fname (Optional str or list of str) – Filename(s) of preview videos. Have to allign to data_source_fname!

build_frame_container_from_df(dfr, union_type, cam_id, frame_offset=0)[source]¶

Builds a FrameContainer from a Pandas DataFrame.

Operates differently from build_frame_container() because it will be used in a different context where we have access to more data.

Column names are matched to Frame and Detection* attributes. Set additional FrameContainer attributes like hiveId in the return value.

Keyword Arguments:
Parameters:	dfr (`pd.DataFrame`) – Pandas dataframe with detection data union_type (str) – the type of detections e.g. detectionsTruth cam_id (int) – id of camera, also used as `FrameContainer` id
	offset (frame) – offset for unique frame ids
Returns:	tuple containing: frame container (`FrameContainer`): converted data from dfr new offset (`int`): number of frames (could be used as frame_offset)
Return type:	tuple

convert_frame_to_numpy(frame, keys=None, add_cols=None)[source]¶

Returns the frame data and detections as a numpy array from the frame.

Note

The frame id is identified in the array as frameId instead of id!

Keyword Arguments:
Parameters:	frame (`Frame`) – datastructure with frame data from capnp.
	keys (Optional iterable of str) – only these keys are converted to the np array. add_cols (Optional dict) – additional columns for the np array, use either a single value or a sequence of correct length.
Returns:	a structured numpy array with frame and detection data.
Return type:	numpy array (np.array)