sempy_labs.lakehouse package

Module contents

sempy_labs.lakehouse.create_materialized_lake_view(name: str, query: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, replace: bool = False, partition_columns: List[str] = None, test_run: bool = False) bool | None

Creates a materialized lake view within a lakehouse.

Requirements: This function must be executed in a PySpark notebook.

Parameters:
  • name (str) – The name of the materialized lake view (not including the lakehouse or workspace names). This must be in schema_name.view_name format.

  • query (str) – The SQL query that defines the materialized lake view. The query must be a valid SQL query that can be executed in the context of the lakehouse.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • replace (bool, default=False) – If True, it will replace the existing materialized lake view if one exists.

  • partition_columns (List[str], default=None) – The columns to partition the materialized lake view by.

  • test_run (bool, default=False) – If True, the function will indicate whether the materialized lake view would be created successfully without actually creating it. This is useful for validating the input parameters and the SQL query before executing the creation of the materialized lake view.

sempy_labs.lakehouse.create_schema(name: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Creates a schema in a Fabric lakehouse.

Parameters:
  • name (str) – The name of the schema.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.create_shortcut_onelake(table_name: str, source_workspace: str | UUID, destination_lakehouse: str | UUID | None = None, destination_workspace: str | UUID | None = None, shortcut_name: str | None = None, source_item: str | UUID = None, source_item_type: str = 'Lakehouse', source_path: str = 'Tables', destination_path: str = 'Tables', shortcut_conflict_policy: str | None = None, **kwargs)

Creates a shortcut to a delta table in OneLake.

This is a wrapper function for the following API: OneLake Shortcuts - Create Shortcut.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • table_name (str) – The table name for which a shortcut will be created.

  • source_workspace (str | uuid.UUID) – The name or ID of the Fabric workspace in which the source data store exists.

  • destination_lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse in which the shortcut will be created. Defaults to None which resolves to the lakehouse attached to the notebook.

  • destination_workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace in which the shortcut will be created. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • shortcut_name (str, default=None) – The name of the shortcut ‘table’ to be created. This defaults to the ‘table_name’ parameter value.

  • source_item (str | uuid.UUID, default=None) – The source Fabric data store item in which the table resides. Can be either the Name or ID of the item.

  • source_item_type (str, default="Lakehouse") – The source Fabric data store item type. Options are ‘Lakehouse’, ‘Warehouse’, ‘MirroredDatabase’, ‘SQLDatabase’, and ‘KQLDatabase’.

  • source_path (str, default="Tables") – A string representing the full path to the table/file in the source lakehouse, including either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.

  • destination_path (str, default="Tables") – A string representing the full path where the shortcut is created, including either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.

  • shortcut_conflict_policy (str, default=None) – When provided, it defines the action to take when a shortcut with the same name and path already exists. The default action is ‘Abort’. Additional ShortcutConflictPolicy types may be added over time.

sempy_labs.lakehouse.delete_lakehouse(lakehouse: str | UUID, workspace: str | UUID | None = None) None

Deletes a lakehouse.

This is a wrapper function for the following API: Items - Delete Lakehouse.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • lakehouse (str | uuid.UUID) – The name or ID of the lakehouse to delete.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.delete_shortcut(shortcut_name: str, shortcut_path: str = 'Tables', lakehouse: str | None = None, workspace: str | UUID | None = None)

Deletes a shortcut.

This is a wrapper function for the following API: OneLake Shortcuts - Delete Shortcut.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • shortcut_name (str) – The name of the shortcut.

  • shortcut_path (str = "Tables") – The path of the shortcut to be deleted. Must start with either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name in which the shortcut resides. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | UUID, default=None) – The name or ID of the Fabric workspace in which lakehouse resides. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.get_lakehouse_columns(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Shows the tables and columns of a lakehouse and their respective properties. This function can be executed in either a PySpark or pure Python notebook. Note that data types may show differently when using PySpark vs pure Python.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • lakehouse_workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

Shows the tables/columns within a lakehouse and their properties.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.get_lakehouse_tables(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, extended: bool = False, count_rows: bool = False, export: bool = False, exclude_shortcuts: bool = False) DataFrame

Shows the tables of a lakehouse and their respective properties. Option to include additional properties relevant to Direct Lake guardrails.

This function can be executed in either a PySpark or pure Python notebook.

This is a wrapper function for the following API: Tables - List Tables plus extended capabilities. However, the above mentioned API does not support Lakehouse schemas (Preview) until it is in GA (General Availability). This version also supports schema enabled Lakehouses.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • extended (bool, default=False) – Obtains additional columns relevant to the size of each table.

  • count_rows (bool, default=False) – Obtains a row count for each lakehouse table.

  • export (bool, default=False) – Exports the resulting dataframe to a delta table in the lakehouse.

  • exclude_shortcuts (bool, default=False) – If True, excludes shortcuts.

Returns:

Shows the tables/columns within a lakehouse and their properties.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.is_schema_enabled(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) bool

Indicates whether a lakehouse has schemas enabled.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

Indicates whether the lakehouse has schemas enabled.

Return type:

bool

sempy_labs.lakehouse.is_v_ordered(table_name: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, schema: str | None = None) bool

Checks if a delta table is v-ordered.

Parameters:
  • table_name (str) – The name of the table to check.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • schema (str, optional) – The schema of the table to check. If not provided, the default schema is used.

Returns:

True if the table is v-ordered, False otherwise.

Return type:

bool

sempy_labs.lakehouse.lakehouse_attached() bool

Identifies if a lakehouse is attached to the notebook.

Returns:

Returns True if a lakehouse is attached to the notebook.

Return type:

bool

sempy_labs.lakehouse.list_blobs(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, container: str | None = None) DataFrame

Returns a list of blobs for a given lakehouse.

This function leverages the following API: List Blobs.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • container (str, default=None) – The container name to list blobs from. If None, lists all blobs in the lakehouse. Valid values are “Tables” or “Files”. If not specified, the function will list all blobs in the lakehouse.

Returns:

A pandas dataframe showing a list of blobs in the lakehouse.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.list_lakehouses(workspace: str | UUID | None = None) DataFrame

Shows the lakehouses within a workspace.

Service Principal Authentication is supported (see here for examples).

Parameters:

workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A pandas dataframe showing the lakehouses within a workspace.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.list_livy_sessions(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Shows a list of livy sessions from the specified item identifier.

This is a wrapper function for the following API: Livy Sessions - List Livy Sessions.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A pandas dataframe showing a list of livy sessions from the specified item identifier.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.list_schemas(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Lists the schemas within a Fabric lakehouse.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

Shows the schemas within a lakehouse.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.list_shortcuts(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, path: str | None = None) DataFrame

Shows all shortcuts which exist in a Fabric lakehouse and their properties.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace in which lakehouse resides. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • path (str, default=None) – The path within lakehouse where to look for shortcuts. If provided, must start with either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName. Defaults to None which will retun all shortcuts on the given lakehouse

Returns:

A pandas dataframe showing all the shortcuts which exist in the specified lakehouse.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.load_table(table_name: str, file_path: str, mode: Literal['Overwrite', 'Append'], lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Loads a table into a lakehouse. Currently only files are supported, not folders.

This is a wrapper function for the following API: Tables - Load Table.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • table_name (str) – The name of the table to load.

  • file_path (str) – The path to the data to load.

  • mode (Literal["Overwrite", "Append"]) – The mode to use when loading the data. “Overwrite” will overwrite the existing data. “Append” will append the data to the existing data.

  • lakehouse (str | uuid.UUID, default=None) – The name or ID of the lakehouse to load the table into. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.optimize_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Runs the OPTIMIZE function over the specified lakehouse tables.

Parameters:
  • tables (str | List[str], default=None) – The table(s) to optimize. If the tables have a schema, use the ‘schema.table’ format. Defaults to None which resolves to optimizing all tables within the lakehouse.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.recover_lakehouse_object(file_path: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Recovers an object (i.e. table, file, folder) in a lakehouse from a deleted state. Only soft-deleted objects can be recovered (deleted for less than 7 days).

Parameters:
  • file_path (str) – The file path of the object to restore. For example: “Tables/my_delta_table”.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.refresh_materialized_lake_views(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Run on-demand Refresh MaterializedLakeViews job instance.

This is a wrapper function for the following API: Background Jobs - Run On Demand Refresh Materialized Lake Views.

Parameters:
  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A DataFrame containing the job instance details of the refresh materialized lake views operation.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.reset_shortcut_cache(workspace: str | UUID | None = None)

Deletes any cached files that were stored while reading from shortcuts.

This is a wrapper function for the following API: OneLake Shortcuts - Reset Shortcut Cache.

Service Principal Authentication is supported (see here for examples).

Parameters:

workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.run_table_maintenance(table_name: str, optimize: bool = False, v_order: bool = False, z_order: str | List[str] | None = None, purge_deletion_vectors: bool = False, vacuum: bool = False, retention_period: str | None = None, schema: str | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame

Runs table maintenance operations on the specified table within the lakehouse.

This is a wrapper function for the following API: Background Jobs - Run On Demand Table Maintenance.

Parameters:
  • table_name (str) – Name of the delta table on which to run maintenance operations.

  • optimize (bool, default=False) –

    If True, the OPTIMIZE function will be run on the table.

  • v_order (bool, default=False) – If True, v-order will be enabled for the table.

  • z_order (str | List[str], default=None) – If specified, the Z-Order optimization will be applied on the table using the provided column(s). Accepts a single column name or a list of column names.

  • purge_deletion_vectors (bool, default=False) – If True, physically removes data marked for deletion by deletion vectors and rewrites the affected parquet files.

  • vacuum (bool, default=False) – If True, the VACUUM function will be run on the table.

  • retention_period (str, default=None) – If specified, the retention period for the vacuum operation. Must be in the ‘d:hh:mm:ss’ format.

  • schema (str, default=None) – The schema of the tables within the lakehouse.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A DataFrame containing the job instance details of the table maintenance operation.

Return type:

pandas.DataFrame

sempy_labs.lakehouse.schema_exists(schema: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) bool

Indicates whether the specified schema exists within a Fabric lakehouse.

Parameters:
  • schema (str) – The name of the schema.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

Indicates whether the specified schema exists within the lakehouse.

Return type:

bool

sempy_labs.lakehouse.update_lakehouse(name: str | None = None, description: str | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)

Updates a lakehouse.

This is a wrapper function for the following API: Items - Update Lakehouse.

Service Principal Authentication is supported (see here for examples).

Parameters:
  • name (str, default=None) – The new name of the lakehouse. Defaults to None which does not update the name.

  • description (str, default=None) – The new description of the lakehouse. Defaults to None which does not update the description.

  • lakehouse (str | uuid.UUID, default=None) – The name or ID of the lakehouse to update. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.lakehouse.vacuum_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, retain_n_hours: int | None = None)

Runs the VACUUM function over the specified lakehouse tables.

Parameters:
  • tables (str | List[str], default=None) – The table(s) to vacuum. If the tables have a schema, use the ‘schema.table’ format. Defaults to None which resolves to vacuuming all tables in the lakehouse.

  • lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.

  • workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • retain_n_hours (int, default=None) – The number of hours to retain historical versions of Delta table files. Files older than this retention period will be deleted during the vacuum operation. If not specified, the default retention period configured for the Delta table will be used. The default retention period is 168 hours (7 days) unless manually configured via table properties.