Skip to content

Latest commit

 

History

History

forecast

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Forecasting Cost

A very important part of adopting any solution is understanding what it will cost. Whilst we will have some residual Azure costs, these will be negligible compared to MGDC costs. There has been a pipeline written to extract the total object counts in each dataset required for the Capacity scenario, this being Sites and Files.

Note

This readme assumes that you have followed Jose's blog and created the resources required for MGDC

Prereqs

Currently, the forecasting pipeline requires a Spark pool to execute the notebook that is used to pull out the total objects from the job metadata that is returned to the storage account on successful execution of MGDC. Future aspirations are to have a different process extract this info and serve up in some form of web interface. But for now you need a Spark pool. Sorry! If you already have a spark pool in your Synapse workspace then great you can use this.

Note

There are instructions showing how to set up a spark pool further down in the readme

MGDC App datasets

If you followed Jose's blog you should have an MGDC app that has permission to extract the Sites dataset.

For oversharing we need the following three datasets:

  • Sites
  • Permission
  • SPOGroups

Please navigate to you MGDC app in the Azure portal and validate that the app has permission to extract these datasets. Remember that if you make any changes you will need to re-approve the app in the MAC (using another global admin account)

  1. Navigate to the MGDC app in the Azure portal and select you app

MGDC App Azure

  1. Validate the datasets under settings

MGDC Oversharing Datasets

  1. Navigate to MGDC apps in the Microsoft Admin Centre. Settings > Org Settings > Security & Privacy

MGDC MAC

  1. Approve (if required). Click and follow the approval flow.

MGDC Update

Spark Pool

You can create a spark pool directly from your Synapse Workspace resource in the Azure portal

  1. Create Spark Pool

Create Spark Pool

  1. Provide your spark pool a name and a size. Small will be more than adequate for the forecast

Complete configuration details

  1. Review and Create > Create

Import the forecast Pipeline

Login to your Synapse Studio and import the pipeline.

  1. Download the Sites_Permissions_SPGroups_Top1.zip from this repo.

Download Oversharing Forecast Pipeline

  1. Open up Synapse Studio. From the Home menu, navigate to Integrate

Integrate Menu

  1. Import the pipeline from the + button. Browse to the downloaded pipeline template.

Import Pipeline

  1. Select your Linked Services (created following Jose's blog) and click Open Pipeline. This will import 1 pipeline, 4 datasets and 1 notebook into your Synapse Studio.

Open Pipeline

  1. Before publishing, navigate to the Develop Tab and open the Forecast Notebook. Ensure that a spark pool has been selected in the Attach to dropdown

Open Pipeline

  1. Click Publish all > Publish

Publish

Forecast for Full Pull

Great you are now ready to execute the pipeline to obtain a forecast. To obtain a full pull we need to provide both the same start and end date.

  1. Navigate to the Integrate Menu and select the Sites_Permissions_SPGroups_Top1 pipeline. Click Add Trigger > Trigger now

Trigger Pipeline

  1. Populate full pull parameters and click OK

Note

MGDC can only go back 21 days. Please update the start and end data parameters to be no longer 21 days ago.

Populate Full Pull Parameters

  1. Navigate to the Monitor tab to see the execution details. Wait for the pipeline to Complete. Typically this will be 25 minutes.

Monitor

  1. Once complete we can check the details extracted in the notebook - If you're pipeline failed then please check the Troubleshooting Section

Pipeline Complete

  1. If the pipeline Succeeded, click on the pipeline run which will open the pipeline activity list. Hover over the notebook activity and click on the glasses icon. This will open the notebook snapshot

Pipeline Complete

  1. Scroll down to the bottom of the notebook until you see a table similar to the below

Forecast Results

We can now use the following formula to work out the MGDC cost of a full pull

$$ \text{Cost} = \frac{\text{Sites} + \text{Permissions} + \text{Groups}}{1000} \times 0.75 $$

Forecast for Delta Pull

Great you are now ready to execute the pipeline to obtain a delta forecast. To obtain a delta pull we need to provide a different start and end date.

  1. Navigate to the Integrate Menu and select the Sites_Permissions_SPGroups_Top1 pipeline. Click Add Trigger > Trigger now

Trigger Pipeline

  1. Populate parameters and click OK. Notice that the dates are 7 days apart. This timescale should be ajusted for the expected cadence. i.e. if running bi-weekly then get a delta forecast for a 14 day period.

Note

MGDC can only go back 21 days. Please update the start and end data parameters to be no longer 21 days ago.

Populate Full Pull Parameters

  1. Navigate to the Monitor tab to see the execution details. Wait for the pipeline to Complete. Typically this will be 25 minutes.

Monitor

  1. Once complete we can check the details extracted in the notebook - If you're pipeline failed then please check the Troubleshooting Section

Pipeline Complete

  1. If the pipeline Succeeded, click on the pipeline run which will open the pipeline activity list. Hover over the notebook activity and click on the glasses icon. This will open the notebook snapshot

Pipeline Complete

  1. Scroll down to the bottom of the notebook until you see a table similar to the below

Forecast Results

We can now use the following formula to work out the ongoing monthly MGDC, the below example assumes weekly MGDC delta snapshots. Remember that the delta may fluctuate each week based on user activity

$$ \text{Monthly Cost} = \frac{\text{Sites} + \text{Permissions} + \text{Groups}}{1000} \times 4 \times 0.75 $$