py_research.db.base module#
Base classes and types for relational database package.
- class Table(db, df, source_map, indexes=<factory>)[source]#
Bases:
object
Table in a relational database.
- Parameters:
- source_map: str | dict[str, str]#
Mapping to source tables of this table.
For single source tables, this is a string containing the name of the source table.
For multiple source tables, the dataframe hast multi-level columns.
source_map
is then a mapping from the first level of these columns to the source tables.
- to_excel(path)[source]#
Save this table to an Excel file.
- Parameters:
path (Path) –
- Return type:
None
- merge(right: SingleTable | None = None, link_to_right: str | tuple[str, str] = None, link_to_left: str | None = None, link_table: SingleTable | None = None, naming: Literal['source', 'path'] = 'source') Table [source]#
- merge(right: SingleTable = None, link_to_right: str | tuple[str, str] | None = None, link_to_left: str | None = None, link_table: SingleTable | None = None, naming: Literal['source', 'path'] = 'source') Table
- merge(right: SingleTable | None = None, link_to_right: str | tuple[str, str] | None = None, link_to_left: str | None = None, link_table: SingleTable = None, naming: Literal['source', 'path'] = 'source') Table
Merge this table with another, returning a new table.
- Parameters:
link_to_right – Name of column to use for linking from left to right table.
link_to_left – Name of column to use for linking from right to left table.
right – Other (left) table to merge with.
link_table – Link table (join table) to use for double merging.
naming – Naming strategy to use for naming the first level of merged columns. Use “path” if you merge multiple times from the same source table.
Note
At least one of
link_to_right
,right
orlink_table
must be supplied.- Returns:
New table containing the merged data. The returned table will have a multi-level column index, where the first level references the source table of each column via the
source_map
attribute.
- flatten(sep='.', prefix_strategy='always')[source]#
Collapse multi-dim. column labels of multi-source table, returning new df.
- class SingleTable(name, db, df)[source]#
Bases:
Table
Relational database table with a single source table.
- filter(filter_series)[source]#
Return a filtered version of this table.
- Parameters:
filter_series (Series) –
- Return type:
- class SourceTable(name, db)[source]#
Bases:
SingleTable
Original table in a relational database.
- class DB(table_dfs=<factory>, relations=<factory>, join_tables=<factory>, schema=None, updates=<factory>, backend=None, _copied=False)[source]#
Bases:
object
Relational database consisting of multiple named tables.
- Parameters:
- backend: Path | None = None#
File backend of this database, hence where it was loaded from and is saved to by default.
- extend(other, conflict_policy='raise')[source]#
Extend this database with data from another, returning a new database.
- Parameters:
other (DB | dict[str, DataFrame] | Table) – Other database, dataframe dict or table to extend with.
conflict_policy (Literal['raise', 'ignore', 'override'] | dict[str, ~typing.Literal['raise', 'ignore', 'override'] | dict[str, ~typing.Literal['raise', 'ignore', 'override']]]) – Policy to use for resolving conflicts. Can be a global setting, per-table via supplying a dict with table names as keys, or per-column via supplying a dict of dicts.
- Returns:
New database containing the extended data.
- Return type:
- trim(centers=None, circuit_breakers=None)[source]#
Return new database minus orphan data (relative to given
centers
).
- filter(filters, extra_cb=None)[source]#
Return new db only containing data related to rows matched by
filters
.- Parameters:
- Returns:
New database containing the filtered data. The returned database will only contain the filtered tables and all tables that have (indirect) references to them.
- Return type:
Note
This is equivalent to trimming the database with the filtered tables as centers and the filtered tables and
extra_cb
as circuit breakers.
- to_graph(nodes)[source]#
Export links between select database objects in a graph format.
E.g. for usage with Gephi
- get(name)[source]#
- Parameters:
name (str) –
- Return type:
SingleTable | None