sempy_labs.lakehouse package

Module contents

sempy_labs.lakehouse.create_shortcut_onelake(table_name: str, source_lakehouse: str, source_workspace: str | UUID, destination_lakehouse: str, destination_workspace: str | UUID | None = None, shortcut_name: str | None = None)

Creates a shortcut to a delta table in OneLake.

This is a wrapper function for the following API: OneLake Shortcuts - Create Shortcut.

Parameters:
  • table_name (str) – The table name for which a shortcut will be created.

  • source_lakehouse (str) – The Fabric lakehouse in which the table resides.

  • source_workspace (str | uuid.UUID) – The name or ID of the Fabric workspace in which the source lakehouse exists.

  • destination_lakehouse (str) – The Fabric lakehouse in which the shortcut will be created.

  • destination_workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace in which the shortcut will be created. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • shortcut_name (str, default=None) – The name of the shortcut ‘table’ to be created. This defaults to the ‘table_name’ parameter value.

sempy_labs.lakehouse.delete_shortcut(shortcut_name: str, lakehouse: str | None = None, workspace: str | UUID | None = None)

Deletes a shortcut.

This is a wrapper function for the following API: OneLake Shortcuts - Delete Shortcut.

Parameters:
  • shortcut_name (str) – The name of the shortcut.

  • lakehouse (str, default=None) – The Fabric lakehouse name in which the shortcut resides. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | UUID, default=None) – The name or ID of the Fabric workspace in which lakehouse resides. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.get_lakehouse_columns(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Shows the tables and columns of a lakehouse and their respective properties.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • lakehouse_workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

Shows the tables/columns within a lakehouse and their properties.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.get_lakehouse_tables(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, extended: bool = False, count_rows: bool = False, export: bool = False) DataFrame

Shows the tables of a lakehouse and their respective properties. Option to include additional properties relevant to Direct Lake guardrails.

This is a wrapper function for the following API: Tables - List Tables plus extended capabilities.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • extended (bool, default=False) – Obtains additional columns relevant to the size of each table.

  • count_rows (bool, default=False) – Obtains a row count for each lakehouse table.

  • export (bool, default=False) – Exports the resulting dataframe to a delta table in the lakehouse.

Returns:

Shows the tables/columns within a lakehouse and their properties.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.lakehouse_attached() bool

Identifies if a lakehouse is attached to the notebook.

Returns:

Returns True if a lakehouse is attached to the notebook.

Return type:

bool

sempy_labs.lakehouse.optimize_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | None = None, workspace: str | UUID | None = None)

Runs the OPTIMIZE function over the specified lakehouse tables.

Parameters:
  • tables (str | List[str], default=None) – The table(s) to optimize. Defaults to None which resovles to optimizing all tables within the lakehouse.

  • lakehouse (str, default=None) – The Fabric lakehouse. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.reset_shortcut_cache(workspace: str | UUID | None = None)

Deletes any cached files that were stored while reading from shortcuts.

This is a wrapper function for the following API: OneLake Shortcuts - Reset Shortcut Cache.

Parameters:

workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.run_table_maintenance(table_name: str, optimize: bool = False, v_order: bool = False, vacuum: bool = False, retention_period: str | None = None, schema: str | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Runs table maintenance operations on the specified table within the lakehouse.

This is a wrapper function for the following API: Background Jobs - Run On Demand Table Maintenance.

Parameters:
  • table_name (str) – Name of the delta table on which to run maintenance operations.

  • optimize (bool, default=False) –

    If True, the OPTIMIZE function will be run on the table.

  • v_order (bool, default=False) – If True, v-order will be enabled for the table.

  • vacuum (bool, default=False) – If True, the VACUUM function will be run on the table.

  • retention_period (str, default=None) – If specified, the retention period for the vacuum operation. Must be in the ‘d:hh:mm:ss’ format.

  • schema (str, default=None) – The schema of the tables within the lakehouse.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.vacuum_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | None = None, workspace: str | UUID | None = None, retain_n_hours: int | None = None)

Runs the VACUUM function over the specified lakehouse tables.

Parameters:
  • tables (str | List[str] | None) – The table(s) to vacuum. If no tables are specified, all tables in the lakehouse will be optimized.

  • lakehouse (str, default=None) – The Fabric lakehouse. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • retain_n_hours (int, default=None) – The number of hours to retain historical versions of Delta table files. Files older than this retention period will be deleted during the vacuum operation. If not specified, the default retention period configured for the Delta table will be used. The default retention period is 168 hours (7 days) unless manually configured via table properties.