sempy_labs.dataflow package

Module contents

sempy_labs.dataflow.assign_workspace_to_dataflow_storage(dataflow_storage_account: str, workspace: str | UUID | None = None)

Assigns a dataflow storage account to a workspace.

This is a wrapper function for the following API: Dataflow Storage Accounts - Groups AssignToDataflowStorage.

Parameters:

dataflow_storage_account (str) – The name of the dataflow storage account.
workspace (str | uuid.UUID, default=None) – The name or ID of the workspace. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

sempy_labs.dataflow.discover_dataflow_parameters(dataflow: str | UUID, workspace: str | UUID) → DataFrame

Retrieves all parameters defined in the specified Dataflow.

This is a wrapper function for the following API: Items - Discover Dataflow Parameters.

Service Principal Authentication is supported (see here for examples).

Parameters:

dataflow (str | uuid.UUID) – Name or ID of the dataflow.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A pandas dataframe showing all parameters defined in the specified Dataflow.

Return type:

pandas.DataFrame

sempy_labs.dataflow.execute_query(dataflow: str | UUID, query_name: str, custom_mashup_document: str | None = None, workspace: str | UUID | None = None) → DataFrame

Executes a query against a dataflow and returns the result.

This is a wrapper function for the following API: Query Execution - Execute Query.

Service Principal Authentication is supported (see here for examples).

Parameters:

dataflow (str | uuid.UUID) – Name or ID of the dataflow.
query_name (str) – The name of the query to execute from the dataflow (or from the custom mashup document if provided).
custom_mashup_document (str, default=None) – Optional custom mashup document to override the dataflow’s default mashup.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A pandas dataframe showing the results of the query execution.

Return type:

pandas.DataFrame

sempy_labs.dataflow.get_dataflow_definition(dataflow: str | UUID, workspace: str | UUID | None = None, decode: bool = True) → dict

Obtains the definition of a dataflow. This supports Gen1, Gen2 and Gen 2 CI/CD dataflows.

This is a wrapper function for the following API: Dataflows - Get Dataflow.

Parameters:

dataflow (str | uuid.UUID) – The name or ID of the dataflow.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name. Defaults to None, which resolves to the workspace of the attached lakehouse or if no lakehouse is attached, resolves to the workspace of the notebook.
decode (bool, optional) – If True, decodes the dataflow definition file.

Returns:

The dataflow definition.

Return type:

dict

sempy_labs.dataflow.list_dataflow_storage_accounts() → DataFrame

Shows the accessible dataflow storage accounts.

This is a wrapper function for the following API: Dataflow Storage Accounts - Get Dataflow Storage Accounts.

Returns:: A pandas dataframe showing the accessible dataflow storage accounts.
Return type:: pandas.DataFrame

sempy_labs.dataflow.list_dataflows(workspace: str | UUID | None = None)

Shows a list of all dataflows which exist within a workspace.

This is a wrapper function for the following API: Items - List Dataflows.

Service Principal Authentication is supported (see here for examples).

Parameters:: workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
Returns:: A pandas dataframe showing the dataflows which exist within a workspace.
Return type:: pandas.DataFrame

sempy_labs.dataflow.list_upstream_dataflows(dataflow: str | UUID, workspace: str | UUID | None = None) → DataFrame

Shows a list of upstream dataflows for the specified dataflow.

This is a wrapper function for the following API: Dataflows - Get Upstream Dataflows In Group.

Service Principal Authentication is supported (see here for examples).

Parameters:

dataflow (str | uuid.UUID) – Name or UUID of the dataflow.
workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.

Returns:

A pandas dataframe showing a list of upstream dataflows for the specified dataflow.

Return type:

pandas.DataFrame

Creates a Dataflow Gen2 CI/CD item based on the mashup definition from an existing Gen1/Gen2 dataflow. After running this function, update the connections in the dataflow to ensure the data can be properly refreshed.

Parameters:

dataflow (str | uuid.UUID) – The name or ID of the dataflow.
workspace (str | uuid.UUID, default=None) – The workspace name or ID. Defaults to None which resolves to the workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.
new_dataflow_name (str, default=None) – Name of the new dataflow.
new_dataflow_workspace (str | uuid.UUID, default=None) – The Fabric workspace name or ID of the dataflow to be created. Defaults to None which resolves to the existing workspace of the attached lakehouse or if no lakehouse attached, resolves to the workspace of the notebook.