sempy_labs.mirrored_azure_databricks_catalog package

Module contents

sempy_labs.mirrored_azure_databricks_catalog.create_mirrored_azure_databricks_catalog(name: str, catalog_name: str, databricks_workspace_connection_id: UUID, auto_sync: bool | None = None, mirroring_mode: Literal['Full', 'Partial'] = 'Full', storage_connection_id: UUID | None = None, mirror_configuration: dict | None = None, description: str | None = None, workspace: str | UUID | None = None) UUID

Creates a mirrored Azure Databricks Catalog within a specified workspace.

This is a wrapper function for the following API: Items - Create Mirrored Azure Databricks Catalog.

Parameters:
  • name (str) – The display name of the mirrored Azure Databricks Catalog.

  • catalog_name (str) – Azure databricks catalog name.

  • databricks_workspace_connection_id (uuid.UUID) – The Azure databricks workspace connection id.

  • auto_sync (bool, Default=None) – Enable or disable automatic synchronization for the catalog. Defaults to None, which means autoSync will be disabled.

  • mirroring_mode (Literal["Full", "Partial"], Default="Full") – The mirroring mode for the catalog. Can be either “Full” or “Partial”.

  • storage_connection_id (uuid.UUID, default=None) – The storage connection id. This is required when mirroring_mode is set to “Full”.

  • mirror_configuration (dict, default=None) – The mirror configuration for the catalog. This is required when mirroring_mode is set to “Partial”. See here for examples.

  • description (str, default=None) – The description of the mirrored Azure Databricks Catalog.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

The ID of the created mirrored Azure Databricks Catalog.

Return type:

uuid.UUID

sempy_labs.mirrored_azure_databricks_catalog.delete_mirrored_azure_databricks_catalog(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None)

Deletes a mirrored Azure Databricks Catalog.

This is a wrapper function for the following API: Items - Delete Mirrored Azure Databricks Catalog.

Parameters:
  • mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog to be deleted.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook

sempy_labs.mirrored_azure_databricks_catalog.discover_catalogs(databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) DataFrame

Returns a list of catalogs from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Catalogs.

Parameters:
  • databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of catalogs from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.discover_schemas(catalog: str, databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) DataFrame

Returns a list of schemas in the given catalog from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Schemas.

Parameters:
  • catalog (str) – The name of the catalog.

  • databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of schemas in the given catalog from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.discover_tables(catalog: str, schema: str, databricks_workspace_connection_id: UUID, workspace: str | UUID | None = None, max_results: int | None = None) DataFrame

Returns a list of schemas in the given catalog from Unity Catalog.

This is a wrapper function for the following API: Databricks Metadata Discovery - Discover Tables.

Parameters:
  • catalog (str) – The name of the catalog.

  • schema (str) – The name of the schema.

  • databricks_workspace_connection_id (uuid.UUID) – The ID of the Databricks workspace connection.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

  • max_results (int, default=None) – The maximum number of results to return. If not specified, all results are returned.

Returns:

A pandas dataframe showing a list of schemas in the given catalog from Unity Catalog.

Return type:

pandas.DataFrame

sempy_labs.mirrored_azure_databricks_catalog.get_mirrored_azure_databricks_catalog(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None, return_dataframe: bool = True) DataFrame | dict
sempy_labs.mirrored_azure_databricks_catalog.list_mirrored_azure_databricks_catalogs(workspace: str | UUID | None = None) DataFrame
sempy_labs.mirrored_azure_databricks_catalog.refresh_catalog_metadata(mirrored_azure_databricks_catalog: str | UUID, workspace: str | UUID | None = None)

Refresh Databricks catalog metadata in mirroredAzureDatabricksCatalogs Item.

This is a wrapper function for the following API: Refresh Metadata - Items RefreshCatalogMetadata.

Parameters:
  • mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook

sempy_labs.mirrored_azure_databricks_catalog.update_mirrored_azure_databricks_catalog(mirrored_azure_databricks_catalog: str | UUID, name: str | None = None, auto_sync: bool | None = None, mirroring_mode: Literal['Full', 'Partial'] | None = None, storage_connection_id: UUID | None = None, description: str | None = None, workspace: str | UUID | None = None) dict

Updates the definition of a mirrored Azure Databricks Catalog within a specified workspace.

This is a wrapper function for the following API: Items - Update Mirrored Azure Databricks Catalog.

Parameters:
  • mirrored_azure_databricks_catalog (str | uuid.UUID) – The name or ID of the mirrored Azure Databricks catalog to be updated.

  • name (str) – The display name of the mirrored Azure Databricks Catalog.

  • auto_sync (bool, Default=None) – Enable or disable automatic synchronization for the catalog. Defaults to None, which means autoSync will be disabled.

  • mirroring_mode (Literal["Full", "Partial"], Default=None) – The mirroring mode for the catalog. Can be either “Full” or “Partial”. If None (the default), the existing mirroring mode is left unchanged.

  • storage_connection_id (uuid.UUID, default=None) – The storage connection id. This is required when mirroring_mode is set to “Full”.

  • description (str, default=None) – The description of the mirrored Azure Databricks Catalog.

  • workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

The updated mirrored Azure Databricks Catalog item definition.

Return type:

dict