puffbird.FrameEngine¶
-
class
puffbird.
FrameEngine
(table, datacols=None, indexcols=None, inplace=False, handle_column_types=True, enforce_identifier_string=False, fastpath=False)[source]¶ Class to handle and transform a
pandas.DataFrame
object.- Parameters
- table
DataFrame
A table with singular
Index
columns, where each column corresponds to a specific data type.MultiIndex
columns will be made singular with theto_flat_index
method. It is recommended that all columns and index names are identifier string types. Individual cells within datacols columns may have arbitrary objects in them, but cells within indexcols columns must be hashable.- datacolslist-like, optional
The columns in table that are considered “data”. For example, columns where each cell is a
numpy.array
object. If None, all columns are considered datacols columns, unless indexcols is specified. Defaults to None.- indexcolslist-like, optional
The columns in table that are immutable or hashable types, e.g. strings or integers. These may correspond to “metadata” that describe or specify the datacols columns. If None, only the index of the table, which may be
MultiIndex
, are considered indexcols columns. If datacols is specified and indexcols is None, then the remaining columns are also added to the index of table. Defaults to None.- inplacebool, optional
If possible do not copy the table object. Defaults to False.
- handle_column_typesbool, optional
If True, converts not string column types to strings. Defaults to True.
- enforce_identifier_stringbool, optional
If True, try to convert all types to identifier string types and check if all columns are identifier string types. Enforcement only works if column types are
str
,Number
, ortuple
object types. Throw an error if enforcement does not work. Defaults to False.
- table
Notes
A table has singular
Index
columns, where each column corresponds to a specific data type. These types of tables are often fetched from databases that use data models such as datajoint. The table often needs to be transformed, so that various computations such asgroupby
can be performed or the data can be plotted easily with packages such as seaborn. In the table, the columns and the index names are considered together and divided into datacols and indexcols. “Data columns” are usually columns that contain Python objects that are iterable and need to be “exploded” in order to convert these columns into numeric or other immutable data types. This is why I call these types of tables “puffy” dataframes. “Index columns” usually contain other information, often considered “metadata”, that uniquely identify each row. Each row for a specific column is considered to have the same data type and can thus be “exploded” the same way. Missing data (NaNs) are allowed.Examples
>>> import pandas as pd >>> import puffbird as pb >>> df = pd.DataFrame({ ... 'a': [[1,2,3], [4,5,6,7], [3,4,5]], ... 'b': [{'c':['asdf'], 'd':['ret']}, {'d':['r']}, {'c':['ff']}], ... }) >>> df a b 0 [1, 2, 3] {'c': ['asdf'], 'd': ['ret']} 1 [4, 5, 6, 7] {'d': ['r']} 2 [3, 4, 5] {'c': ['ff']} >>> engine = pb.FrameEngine(df)
The
FrameEngine
instance has various methods that allow for quick manipulation of this “puffy” dataframe. For example, we can create a long dataframe using theto_long()
method:>>> engine.to_long() index_col_0 b_level0 b_level1 b a_level0 a 0 0 c 0 asdf 0 1.0 1 0 c 0 asdf 1 2.0 2 0 c 0 asdf 2 3.0 3 0 d 0 ret 0 1.0 4 0 d 0 ret 1 2.0 5 0 d 0 ret 2 3.0 6 1 d 0 r 0 4.0 7 1 d 0 r 1 5.0 8 1 d 0 r 2 6.0 9 1 d 0 r 3 7.0 10 2 c 0 ff 0 3.0 11 2 c 0 ff 1 4.0 12 2 c 0 ff 2 5.0
Attributes
Tuple of “data columns” and “index columns” in the table.
Mapping of renamed “data columns” and “index columns” in table.
Tuple of the “data columns” in the table.
Mapping of renamed “data columns” in table.
Tuple of the “index columns” in the table.
Mapping of renamed “index columns” in table.
DataFrame
passed during initialization.Methods
apply
(func, new_col_name, *args[, …])Apply a function to each row in the table.
col_apply
(func, col[, new_col_name, …])Apply a function to a specific column in each row in the table.
drop
(*cols[, skip, skip_index, skip_data])Drop columns in place.
expand_col
(col[, reset_index, dropna, …])Expand a column that contain
DataFrame
orSeries
object types to create a single long-formatDataFrame
.multid_pivot
([values])Pivot the table to create a multidimensional
xarray.DataArray
orxarray.DataSet
object.rename
(**rename_kws)Rename columns in place.
to_long
(*cols[, iterable, max_depth, …])Transform the “puffy” table into a long-format
DataFrame
.to_puffy
(*indexcols[, keep_missing_idcs, …])Make the table “puffier” by aggregating across unique sets of “index columns”.