puffbird.FrameEngine¶

class puffbird.FrameEngine(table, datacols=None, indexcols=None, inplace=False, handle_column_types=True, enforce_identifier_string=False, fastpath=False)[source]¶

Class to handle and transform a pandas.DataFrame object.

Parameters

tableDataFrame: A table with singular Index columns, where each column corresponds to a specific data type. MultiIndex columns will be made singular with the to_flat_index method. It is recommended that all columns and index names are identifier string types. Individual cells within datacols columns may have arbitrary objects in them, but cells within indexcols columns must be hashable.
datacolslist-like, optional: The columns in table that are considered “data”. For example, columns where each cell is a numpy.array object. If None, all columns are considered datacols columns, unless indexcols is specified. Defaults to None.
indexcolslist-like, optional: The columns in table that are immutable or hashable types, e.g. strings or integers. These may correspond to “metadata” that describe or specify the datacols columns. If None, only the index of the table, which may be MultiIndex, are considered indexcols columns. If datacols is specified and indexcols is None, then the remaining columns are also added to the index of table. Defaults to None.
inplacebool, optional: If possible do not copy the table object. Defaults to False.
handle_column_typesbool, optional: If True, converts not string column types to strings. Defaults to True.
enforce_identifier_stringbool, optional: If True, try to convert all types to identifier string types and check if all columns are identifier string types. Enforcement only works if column types are str, Number, or tuple object types. Throw an error if enforcement does not work. Defaults to False.

Notes

A table has singular Index columns, where each column corresponds to a specific data type. These types of tables are often fetched from databases that use data models such as datajoint. The table often needs to be transformed, so that various computations such as groupby can be performed or the data can be plotted easily with packages such as seaborn. In the table, the columns and the index names are considered together and divided into datacols and indexcols. “Data columns” are usually columns that contain Python objects that are iterable and need to be “exploded” in order to convert these columns into numeric or other immutable data types. This is why I call these types of tables “puffy” dataframes. “Index columns” usually contain other information, often considered “metadata”, that uniquely identify each row. Each row for a specific column is considered to have the same data type and can thus be “exploded” the same way. Missing data (NaNs) are allowed.

Examples

>>> import pandas as pd
>>> import puffbird as pb
>>> df = pd.DataFrame({
...     'a': [[1,2,3], [4,5,6,7], [3,4,5]],
...     'b': [{'c':['asdf'], 'd':['ret']}, {'d':['r']}, {'c':['ff']}],
... })
>>> df
              a                              b
0     [1, 2, 3]  {'c': ['asdf'], 'd': ['ret']}
1  [4, 5, 6, 7]                   {'d': ['r']}
2     [3, 4, 5]                  {'c': ['ff']}
>>> engine = pb.FrameEngine(df)

The FrameEngine instance has various methods that allow for quick manipulation of this “puffy” dataframe. For example, we can create a long dataframe using the to_long() method:

>>> engine.to_long()
    index_col_0 b_level0  b_level1     b  a_level0    a
           0        c         0  asdf         0  1.0
           0        c         0  asdf         1  2.0
           0        c         0  asdf         2  3.0
           0        d         0   ret         0  1.0
           0        d         0   ret         1  2.0
           0        d         0   ret         2  3.0
           1        d         0     r         0  4.0
           1        d         0     r         1  5.0
           1        d         0     r         2  6.0
           1        d         0     r         3  7.0
          2        c         0    ff         0  3.0
          2        c         0    ff         1  4.0
          2        c         0    ff         2  5.0

Attributes

`cols`	Tuple of “data columns” and “index columns” in the table.
`cols_rename`	Mapping of renamed “data columns” and “index columns” in table.
`datacols`	Tuple of the “data columns” in the table.
`datacols_rename`	Mapping of renamed “data columns” in table.
`indexcols`	Tuple of the “index columns” in the table.
`indexcols_rename`	Mapping of renamed “index columns” in table.
`table`	`DataFrame` passed during initialization.

Methods

`apply`(func, new_col_name, *args[, …])	Apply a function to each row in the table.
`col_apply`(func, col[, new_col_name, …])	Apply a function to a specific column in each row in the table.
`drop`(*cols[, skip, skip_index, skip_data])	Drop columns in place.
`expand_col`(col[, reset_index, dropna, …])	Expand a column that contain `DataFrame` or `Series` object types to create a single long-format `DataFrame`.
`multid_pivot`([values])	Pivot the table to create a multidimensional `xarray.DataArray` or `xarray.DataSet` object.
`rename`(**rename_kws)	Rename columns in place.
`to_long`(*cols[, iterable, max_depth, …])	Transform the “puffy” table into a long-format `DataFrame`.
`to_puffy`(*indexcols[, keep_missing_idcs, …])	Make the table “puffier” by aggregating across unique sets of “index columns”.

puffbird.puffy_to_long

puffbird.FrameEngine.cols