sempy_labs.mirrored_azure_databricks_catalog package

Module contents

sempy_labs.mirrored_azure_databricks_catalog.create_mirrored_azure_databricks_catalog(name: str, catalog_name: str, databricks_workspace_connection_id: UUID, auto_sync: bool | None = None, mirroring_mode: Literal['Full', 'Partial'] = 'Full', storage_connection_id: UUID | None = None, mirror_configuration: dict | None = None, description: str | None = None, workspace: str | UUID | None = None) → UUID

Creates a mirrored Azure Databricks Catalog within a specified workspace.

This is a wrapper function for the following API: Items - Create Mirrored Azure Databricks Catalog.

Parameters:

name (str) – The display name of the mirrored Azure Databricks Catalog.
catalog_name (str) – Azure databricks catalog name.
databricks_workspace_connection_id (uuid.UUID) – The Azure databricks workspace connection id.
auto_sync (bool, Default=None) – Enable or disable automatic synchronization for the catalog. Defaults to None, which means autoSync will be disabled.
mirroring_mode (Literal["Full", "Partial"], Default="Full") – The mirroring mode for the catalog. Can be either “Full” or “Partial”.
storage_connection_id (uuid.UUID, default=None) – The storage connection id. This is required when mirroring_mode is set to “Full”.
mirror_configuration (dict, default=None) – The mirror configuration for the catalog. This is required when mirroring_mode is set to “Partial”. See here for examples.
description (str, default=None) – The description of the mirrored Azure Databricks Catalog.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

The ID of the created mirrored Azure Databricks Catalog.

Return type:

uuid.UUID

sempy_labs.mirrored_azure_databricks_catalog.delete_mirrored_azure_databricks_catalog(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None)

Deletes a mirrored Azure Databricks Catalog.

This is a wrapper function for the following API: Items - Delete Mirrored Azure Databricks Catalog.

Parameters:

mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog to be deleted.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook

sempy_labs.mirrored_azure_databricks_catalog.discover_catalogs(databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) → DataFrame

Returns a list of catalogs from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Catalogs.

Parameters:

databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of catalogs from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.discover_schemas(catalog: str, databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) → DataFrame

Returns a list of schemas in the given catalog from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Schemas.

Parameters:

catalog (str) – The name of the catalog.
databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of schemas in the given catalog from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.discover_tables(catalog: str, schema: str, databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) → DataFrame

Returns a list of schemas in the given catalog from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Tables.

Parameters:

catalog (str) – The name of the catalog.
schema (str) – The name of the schema.
databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of schemas in the given catalog from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.get_mirrored_azure_databricks_catalog(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None, return_dataframe: bool = True) → DataFrame | dict

sempy_labs.mirrored_azure_databricks_catalog.list_mirrored_azure_databricks_catalogs(workspace: str | UUID | None = None) → DataFrame

sempy_labs.mirrored_azure_databricks_catalog.refresh_catalog_metadata(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None)

Refresh Databricks catalog metadata in mirroredAzureDatabricksCatalogs Item.

This is a wrapper function for the following API: Refresh Metadata - Items RefreshCatalogMetadata.

Parameters:

mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook

Updates the definition of a mirrored Azure Databricks Catalog within a specified workspace.

This is a wrapper function for the following API: Items - Update Mirrored Azure Databricks Catalog.

Parameters:

mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog to be updated.
name (str) – The display name of the mirrored Azure Databricks Catalog.
auto_sync (bool, Default=None) – Enable or disable automatic synchronization for the catalog. Defaults to None, which means autoSync will be disabled.
mirroring_mode (Literal["Full", "Partial"], Default=None) – The mirroring mode for the catalog. Can be either “Full” or “Partial”. If None (the default), the existing mirroring mode is left unchanged.
storage_connection_id (uuid.UUID, default=None) – The storage connection id. This is required when mirroring_mode is set to “Full”.
description (str, default=None) – The description of the mirrored Azure Databricks Catalog.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

The updated mirrored Azure Databricks Catalog item definition.

Return type:

dict