sempy_labs.lakehouse package
Module contents
- sempy_labs.lakehouse.create_shortcut_onelake(table_name: str, source_lakehouse: str | UUID, source_workspace: str | UUID, destination_lakehouse: str | UUID | None = None, destination_workspace: str | UUID | None = None, shortcut_name: str | None = None, source_path: str = 'Tables', destination_path: str = 'Tables', shortcut_conflict_policy: str | None = None)
Creates a shortcut to a delta table in OneLake.
This is a wrapper function for the following API: OneLake Shortcuts - Create Shortcut.
Service Principal Authentication is supported (see here for examples).
- Parameters:
table_name (str) – The table name for which a shortcut will be created.
source_lakehouse (str | uuid.UUID) – The Fabric lakehouse in which the table resides.
source_workspace (str | uuid.UUID) – The name or ID of the Fabric workspace in which the source lakehouse exists.
destination_lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse in which the shortcut will be created. Defaults to None which resolves to the lakehouse attached to the notebook.
destination_workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace in which the shortcut will be created. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
shortcut_name (str, default=None) – The name of the shortcut ‘table’ to be created. This defaults to the ‘table_name’ parameter value.
source_path (str, default="Tables") – A string representing the full path to the table/file in the source lakehouse, including either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.
destination_path (str, default="Tables") – A string representing the full path where the shortcut is created, including either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.
shortcut_conflict_policy (str, default=None) – When provided, it defines the action to take when a shortcut with the same name and path already exists. The default action is ‘Abort’. Additional ShortcutConflictPolicy types may be added over time.
- sempy_labs.lakehouse.delete_shortcut(shortcut_name: str, shortcut_path: str = 'Tables', lakehouse: str | None = None, workspace: str | UUID | None = None)
Deletes a shortcut.
This is a wrapper function for the following API: OneLake Shortcuts - Delete Shortcut.
Service Principal Authentication is supported (see here for examples).
- Parameters:
shortcut_name (str) – The name of the shortcut.
shortcut_path (str = "Tables") – The path of the shortcut to be deleted. Must start with either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName.
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name in which the shortcut resides. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | UUID, default=None) – The name or ID of the Fabric workspace in which lakehouse resides. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
- sempy_labs.lakehouse.get_lakehouse_columns(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None) DataFrame
Shows the tables and columns of a lakehouse and their respective properties. This function can be executed in either a PySpark or pure Python notebook. Note that data types may show differently when using PySpark vs pure Python.
Service Principal Authentication is supported (see here for examples).
- Parameters:
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
lakehouse_workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
- Returns:
Shows the tables/columns within a lakehouse and their properties.
- Return type:
- sempy_labs.lakehouse.get_lakehouse_tables(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, extended: bool = False, count_rows: bool = False, export: bool = False) DataFrame
Shows the tables of a lakehouse and their respective properties. Option to include additional properties relevant to Direct Lake guardrails.
This function can be executed in either a PySpark or pure Python notebook.
This is a wrapper function for the following API: Tables - List Tables plus extended capabilities.
Service Principal Authentication is supported (see here for examples).
- Parameters:
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
extended (bool, default=False) – Obtains additional columns relevant to the size of each table.
count_rows (bool, default=False) – Obtains a row count for each lakehouse table.
export (bool, default=False) – Exports the resulting dataframe to a delta table in the lakehouse.
- Returns:
Shows the tables/columns within a lakehouse and their properties.
- Return type:
- sempy_labs.lakehouse.lakehouse_attached() bool
Identifies if a lakehouse is attached to the notebook.
- Returns:
Returns True if a lakehouse is attached to the notebook.
- Return type:
- sempy_labs.lakehouse.list_blobs(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, container: str | None = None) DataFrame
Returns a list of blobs for a given lakehouse.
This function leverages the following API: List Blobs.
- Parameters:
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
container (str, default=None) – The container name to list blobs from. If None, lists all blobs in the lakehouse. Valid values are “Tables” or “Files”. If not specified, the function will list all blobs in the lakehouse.
- Returns:
A pandas dataframe showing a list of blobs in the lakehouse.
- Return type:
- sempy_labs.lakehouse.list_shortcuts(lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, path: str | None = None) DataFrame
Shows all shortcuts which exist in a Fabric lakehouse and their properties.
- Parameters:
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The name or ID of the Fabric workspace in which lakehouse resides. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
path (str, default=None) – The path within lakehouse where to look for shortcuts. If provided, must start with either “Files” or “Tables”. Examples: Tables/FolderName/SubFolderName; Files/FolderName/SubFolderName. Defaults to None which will retun all shortcuts on the given lakehouse
- Returns:
A pandas dataframe showing all the shortcuts which exist in the specified lakehouse.
- Return type:
- sempy_labs.lakehouse.optimize_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)
Runs the OPTIMIZE function over the specified lakehouse tables.
- Parameters:
tables (str | List[str], default=None) – The table(s) to optimize. Defaults to None which resovles to optimizing all tables within the lakehouse.
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
- sempy_labs.lakehouse.recover_lakehouse_object(file_path: str, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)
Recovers an object (i.e. table, file, folder) in a lakehouse from a deleted state. Only soft-deleted objects can be recovered (deleted for less than 7 days).
- Parameters:
file_path (str) – The file path of the object to restore. For example: “Tables/my_delta_table”.
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
- sempy_labs.lakehouse.reset_shortcut_cache(workspace: str | UUID | None = None)
Deletes any cached files that were stored while reading from shortcuts.
This is a wrapper function for the following API: OneLake Shortcuts - Reset Shortcut Cache.
- sempy_labs.lakehouse.run_table_maintenance(table_name: str, optimize: bool = False, v_order: bool = False, vacuum: bool = False, retention_period: str | None = None, schema: str | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None)
Runs table maintenance operations on the specified table within the lakehouse.
This is a wrapper function for the following API: Background Jobs - Run On Demand Table Maintenance.
- Parameters:
table_name (str) – Name of the delta table on which to run maintenance operations.
optimize (bool, default=False) –
If True, the OPTIMIZE function will be run on the table.
v_order (bool, default=False) – If True, v-order will be enabled for the table.
vacuum (bool, default=False) – If True, the VACUUM function will be run on the table.
retention_period (str, default=None) – If specified, the retention period for the vacuum operation. Must be in the ‘d:hh:mm:ss’ format.
schema (str, default=None) – The schema of the tables within the lakehouse.
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
- sempy_labs.lakehouse.vacuum_lakehouse_tables(tables: str | List[str] | None = None, lakehouse: str | UUID | None = None, workspace: str | UUID | None = None, retain_n_hours: int | None = None)
Runs the VACUUM function over the specified lakehouse tables.
- Parameters:
tables (str | List[str] | None) – The table(s) to vacuum. If no tables are specified, all tables in the lakehouse will be optimized.
lakehouse (str | uuid.UUID, default=None) – The Fabric lakehouse name or ID. Defaults to None which resolves to the lakehouse attached to the notebook.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID used by the lakehouse. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
retain_n_hours (int, default=None) – The number of hours to retain historical versions of Delta table files. Files older than this retention period will be deleted during the vacuum operation. If not specified, the default retention period configured for the Delta table will be used. The default retention period is 168 hours (7 days) unless manually configured via table properties.