Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Audit Logging To SecreteManager.exe #4183

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open

Conversation

davfost
Copy link
Contributor

@davfost davfost commented Sep 27, 2024

Add Audit Logging To SecreteManager.exe

  • Adds the OpenTelemetry.Audit.Geneva package and dependencies
  • Following logging recommendations from Geneva team
  • Added non-volatile logging operations to secret update and create operations
  • Adds new command line options for setting the target service ID to used when performing audit logging

Part Of GitHub Issue https://github.com/dotnet/arcade/issues/13217

// The hope is that app insights will also catch the base exception for debugging.
catch (Exception ex)
{
Console.WriteLine($"Failed to add audit log for secret update!: <{ex.Message}>");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the audit message always be safe for public view in a console?

Copy link
Contributor Author

@davfost davfost Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so as we are not passing any 'secret' data in the logs that can be exposed. The identity information is the information provided by the caller so they should already know it or have access to it. If there is a concern, we can remove the ex.Message component to be extra safe but it will make debugging harder if we ever have to investigate and issue where expected logs are missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the exception for now, if debugging becomes and issue we can come back to this.

<configuration>
<packageSources>
<clear />
<add key="AzureGenevaMonitoring" value="https://msblox.pkgs.visualstudio.com/_packaging/AzureGenevaMonitoring/nuget/v3/index.json" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What permissions are required to access this feed? Will there be issues with CI and individual users getting access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently an issue with this feed and it blocked the PR build from completing. I have been discussing it with Matt Mitchell to see how it can be resolved. As you observed this feed requires external permissions that are not granted so the package restore is failing.

The basic issue is that the Audit team says we have to use the version of the package that comes from 'this' feed and not the public version of the package.

  • I need to investigate if the 'public' feed version of the package has the same class and methods defined. If so, we can do something where the 'internal feed' is different than the feed used in the 'public' processes.
  • I also have open questions to the Audit team to verify if the package from 'this' feed CAN be exposed publicly. If so we may be able to just manually move target versions into one of the feeds we manage directly.

Matt said these two options have been used in the past to deal with other packages that have similar issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good good, Matt should have good ideas here.

Just to say it explicitly, do not merge this PR until there's a good solution for public CI and for local dev.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't think we can merge (validly) until this is address as it build breaks at the NuGet restore step. So, it has to be fixed in some way in order to pass the PR validation build.

/// Provides the ServiceTreeId set with global options
/// The ID is a guid and is set to Guid.Empty if not set
/// </summary>
public Guid ServiceTreeId = Guid.Empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a publicly-writable field❓ seems much more normal to only expose const fields and properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceTreeId is a property value that must be access outside of the class so it can't be private or hidden, and it is not a const because it is alterable via command option settings passed in by the caller.

I have altered the code to use an private class variable to data storage and exposed the property as a 'get' only to make it read only outside of option setting operations.

{
return new OptionSet()
{
{"servicetreeid=", "The service tree ID (must be a valid GUID id from aka.ms/servicetree)", id =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what enforces validity in Service Tree❓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing other than it being a 'Guid'. The audit logging process from does not verify that a ID is valid just that it is passed and the value meets the data type criteria.

If you pass an empty Guid or a valid Guid for a non-existent service tree ID it would still 'log' to Geneva successfully.

I don't think we have an additional obligation to make this process have to communicate with an external Service Tree API to ensure IDs are valid.

If the 'service' does not use a correct ID then it will still report to Geneva but Geneva will not close any SFI KPI work items associated with the service so in that sense you are incentivized to use a valid and correct ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put another way, very few non-empty Guids are not valid service tree IDs. looks like those are ignored until the option is used. that is, this checks for Guid validity and not service tree validity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is any reasonable way for this library to ensure it is a valid service tree ID. I think about this instead as being sort of "tagging" information or metadata that gets passed to Geneva, then Geneva happens to do something interesting with it.

Copy link
Member

@dougbu dougbu Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine w/ the level of validation implemented here. my main concern is the option's description implies something more. how about the following❓

Suggested change
{"servicetreeid=", "The service tree ID (must be a valid GUID id from aka.ms/servicetree)", id =>
{"servicetreeid=", "The service tree ID (must be a non-empty GUID and should be a valid id from https://aka.ms/servicetree)", id =>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean about the description for input but 'should be' implies that something else is OK to use here.

It is not. The value is only meaningful if it is a valid service tree ID and specifically the service tree ID for your service.

Any other Guid even one for a valid service tree ID for a service other than the one running the process would be invalid.

The value 'would' be accepted by Geneva and recorded but it is not correct and does not benefit the actual service that needs to report so it can close it's KPIs.

However, the way this option is being provided allows for service we do not own to also use this tool and generate a log that can go to Geneva.

We cannot predict what those possible IDs would be or how they will change in the future.

What we do know is that the ID is meant to be a Guid structure that comes from https://aka.ms/servicetree and relates to the specific service owned by the 'caller'.

I will change the comment to be something less directive so it does state must or should and instead just provides the link back to the location where valid IDs can be found.

davfost and others added 13 commits October 24, 2024 15:40
Co-authored-by: Doug Bunting <6431421+dougbu@users.noreply.github.com>
Co-authored-by: Doug Bunting <6431421+dougbu@users.noreply.github.com>
Co-authored-by: Doug Bunting <6431421+dougbu@users.noreply.github.com>
…AzureKeyVault.cs

Co-authored-by: Michael Stuckey <michael.stuckey@microsoft.com>
…AzureKeyVault.cs

Co-authored-by: Michael Stuckey <michael.stuckey@microsoft.com>
…ectBaseCommand.cs

Co-authored-by: Doug Bunting <6431421+dougbu@users.noreply.github.com>
…ectBaseCommand.cs

Co-authored-by: Doug Bunting <6431421+dougbu@users.noreply.github.com>
…ommnad option setting processes per PR comment.
Copy link
Member

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new comments contain a few more spelling errors

Comment on lines 34 to 35
ApplicationId = jsonToken?.Claims?.FirstOrDefault(claim => claim.Type == "oid")?.Value ?? "Claim Oid Not Found";
TenantId = jsonToken?.Claims?.FirstOrDefault(claim => claim.Type == "tenant_id")?.Value ?? "Claim tenant_id Not Found";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these expressions throw instead of setting the properties to invalid values❓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of reasons not to throw.

  • We want this logging operation to be non-volatile no matter what. It is better not to log anything than have logging or logging setup cause our service to fail.
  • The correctness of these value is not enforced by Geneva
    • They force us to pass a value but do not verify it is a correct or valid Id just that a value exists
  • Logs are still valuable even if we cannot get ever bit identifying data that would want so we would prefer to log something rather than nothing

Looking at this code again throw I think it should also be wrapped in a try catch block to ensure that even if the process of extracting clams value throws an exception the process does not cause the service to fail.

{
return new OptionSet()
{
{"servicetreeid=", "The service tree ID (must be a valid GUID id from aka.ms/servicetree)", id =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put another way, very few non-empty Guids are not valid service tree IDs. looks like those are ignored until the option is used. that is, this checks for Guid validity and not service tree validity


/// <summary>
/// Provides a non-volitie warning message if the ServiceTreeId option is set to a empty guid value and argments have been parsed
internal void ValidateServiceTreeIdOption()
Copy link
Member

@dougbu dougbu Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this doesn't actually validate the service tree ID. it only validates the value is a non-empty Guid. suggest using a different name or actually confirming the ID is valid

}

public override async Task RunAsync(CancellationToken cancellationToken)
{
try
{
// Provides a curtisy warning message if the ServiceTreeId option is set to a empty guid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment would not be necessary if the ValidateServiceTreeIdOption name were updated

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http;

namespace Microsoft.DncEng.SecretManager;

public class Program : DependencyInjectedConsoleApp
{
/// <summary>
/// Object stores global command setting as parsed from the command line at the main method
/// We mark this valud as protected so it can be accessed by processes that invoke the assembly outside of the command line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// We mark this valud as protected so it can be accessed by processes that invoke the assembly outside of the command line
/// We mark this value as protected so it can be accessed by processes that invoke the assembly outside of the command line

also, what invokes the assembly outside the command line❓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that this code is deployed in a nuget package and used directly as an assembly.

This means the package can be consumed and used as an assembly in a project.

I don't know if it is or is not used that way, but I believe it is possible to be used that way.

If it is ever used that way, then you would need to set these 2 values directly. Assuming that they also leverage the protected method ConfigureServices to setup injection operations.

If that is considered an unsupported scenario, I would be fine with removing the comment and changing the protection levels to private.

/// Object stores global command setting as parsed from the command line at the main method
/// We mark this valud as protected so it can be accessed by processes that invoke the assembly outside of the command line
/// </summary>
protected static GlobalCommand _globalCommand = new GlobalCommand();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected static GlobalCommand _globalCommand = new GlobalCommand();
protected static readonly GlobalCommand _globalCommand = new GlobalCommand();

// We then parse the ProjectBaseCommand to ensure we collect the service tree id at the start of the progress
// so it can be used for dependency ingjection
// The global option setings are passed to all other command objects that inhearit from the ProjectBaseCommand
var projectBaseCommand = new ProjectBaseCommand(_globalCommand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, looks very strange to inherit from GlobalCommand and pass a GlobalCommand instance into the constructor. also feels odd for regular commands such as ValidateCommand to inherit from GlobalCommand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I am not familiar with Microsoft.DncEng.CommandLineLib options parsing logic and I was unable to find a different way to override global command or access them at this stage in the process.

Ideally I would be creating an override of the GlobalCommand object and adding the new ServiceTreeId property to it and then accessing the GlobalCommand inside of the other command types.

However, there seems to be some kind of high level injection override that prevents creating a custom 'GlobalCommand' object and forces the use the 'base' type.

The only examples I found for relaying 'global' data down to local commands required pre-parsing the global command in the 'main' method and passing the global setting into the other command types via constructor arguments.

Since I was not able to alter the override GlobalCommand object I instead created a base command type and then did the required pre-parsing of data to get the values needed before injection operations are carried out and passed the GlobalCommand data down to the other command types via the base command class I created.

If someone has an example of a better way to do this, I am more than a happy to make a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive but looking at the code in dotnet/dnceng-shared, it feels like the global command isn't really intended to be extended. may make more sense for ProjectBaseCommand to inherit from Command.

note DefaultConsole.ShouldWrite(VerbosityLevel) exposes GlobalCommand.Verbosity indirectly and the --help command (all that GlobalCommand supports) is handled internally w/in the CommandLineLib package

Copy link
Contributor Author

@davfost davfost Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultConsole appears to be a Consol Writer tool that uses ICommandOptions from it's constructor to extract a 'GlobalCommand' object.

It does not seem correct to leverage a console accessor object as a wrapper to access another object type that happens to be exposed inside that class.

image

I don't need the consol object or the methods that DefaultConsole provides. I just need to access to global command options so the global setting can actually be applied in sub commands.

However, I will see if I can leverage the same ICommandOptions interface in the same way DefaultConsole does to access the GlobalComman object and avoid creating it and parsing it directly in the Main method.

Even if that does not work or allow for removing the global object parsing in Main, I can still make ProjectBaseCommand inherit from Command, it only inherits from GlobalCommand to reduce the amount of redundant property settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here is the root issue that is resulting in the awkward code for the base command object.

  • We need the caller to pass in the Service Tree ID
    • This was a design choice as opposed to hard coding the service tree ID to a ID our team owns.
    • The idea here is that this code can be used by other services and the audit logging then belongs to another team not our team.
  • Ideally the Service Tree ID should be a new 'global command' option since it is for a global logging process not any one specific command.
    • The ID is also needed to instantiate the SecurityAuditLogger class as that is a basically the only really required values for Geneva audit logging. (Even though Geneva does not care if the ID is a real service tree ID or not. This value is the first 'scope' setting used to differentiate logs for search criteria)
  • However Microsoft.DncEng.CommandLineLib does not 'Parse' arguments for global command 'before' it call ConfigureServices to configure project level dependency injection instructions.
    • The command options get parsed when the call 'return new Program().RunAsync(args)' is executed in program.cs;
    • This is to late to get the Service Tree ID needed to setup the injection instruction for SecurityAuditLogger unless we were to modify Microsoft.DncEng.CommandLineLib to provide the SecurityAuditLogger object and add Service Tree ID to the GlobalCommand object. (This is both outside the scope of this project and it does would be problematic as Geneva Audit logging is very 'Service Specific' so it doe not lend itself to having a 'generic audit logger class that can be used across many service)
    • We could modify Microsoft.DncEng.CommandLineLib to have it's GlobalCommand object include the Service Tree Id property only, but that does not address the 'late parsing' issue for defined the projects SecurityAuditLogger injection process.
    • The late parsing is necessary as Microsoft.DncEng.CommandLineLib appears to allow you implement your own own Injection instructions for GlobalCommnd objects. Although it 'down casts' the object before parsing args, which likely explains why properties added to a custom GlobalCommand fail to be set even when you provide your a custom GlobalCommand implementation and are not shown in help output.

So, what things do I know definitely works?

  • We can create a static 'Command or GlobalCommand' object and parse it before calling Program().RunAsync(args) in the Main method to read the Service Tree ID early.
  • If we use a 'Command' object we get to see the Service Tree Id setting specified in 'help' output.
    • We do not see it if we add it to a 'GlobalCommand' only I bleive for the downcast issue I mentioned earlier.

**_This seems to mean we need to have the new property added to 'command' in order for them to be seen in any 'help' output, and we have to manually parse the options before calling new Program().RunAsync(args) in the main method in order to ensure SecurityAuditLogger is correctly setup.

This is because the Microsoft.DncEng.CommandLineLib injection instructions require the project level injection instructions to run first and then downcast any custom implementations of GlobalCommand objects before parsing args._**

I will discuss this with Michael to see if there is something I am missing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an option which applies to most of the available commands (e.g. does the service tree id add anything useful to InfoCommand or TestCommand❓) doesn't feel "global" to me. it may look awkward to have the same option in multiple Command subclasses but not as awkward as the current code.

separately, the parse ordering points seem less relevant to me. why does the service tree id option need to be parsed before anything else❓ is some logging done prior to the start of the Command execution❓ the code in DependencyInjectedConsoleApp creates the ServiceCollection within its RunAsync(...) method

Copy link
Contributor Author

@davfost davfost Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem very concerned about this code and its implementation perhaps should have an offline conversation to address your concerns. We should also include Matt and Michael.

To be clear I have no passion one way or another about how the command parsing or injection logic is carried out as that has nothing directly to do with the objective of the SFI work.

The only reasons I am messing with these operations at all is because it was decided to pass in the Service Tree ID instead of hard coding it to one of our internal service tree id value since this process can be used by other teams for other services.

The ultimate goal is to be able to ensure any command that needs to perform audit logging can instantiate a logging object safely without producing mystery run time failure if other logging needs to be added in the future. Or worse having logging just fail silently.

I will look to setup a meeting between you me and Matt to review and discuss the specifics of the implementation.

Copy link
Contributor Author

@davfost davfost Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Service Tree ID need to be parsed from the command line in order to passed to the Audit Logger object (design requirement)
  • The Audit Logger Object is part of this project space is injection instructions are defined in the local ConfigureServices method
    -The instruction is used when the AzureKeyVault object in instantiated which happens dynamically.
  • The call to ConfigureServices happens before the Argument Parser logic actually parses arguments.
  • Based on testing the AzureKeyVault also gets instantiated before normal Argument parsing for command occur.

The reason service tree needs to be manually parsed is because the audit logger is instantiated by the injection logic before the 'normal' argument parsing occurs.

I have found no way to circumvent these order of operation issues so far.

Comment on lines 47 to 48
ApplicationId = jsonToken?.Claims?.FirstOrDefault(claim => claim.Type == "appid")?.Value ?? "Claim appid (Application Id) Not Found";
TenantId = jsonToken?.Claims?.FirstOrDefault(claim => claim.Type == "tid")?.Value ?? "Claim tid (Tenant Id) Not Found";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these expressions throw instead of setting the properties to invalid values❓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other comment I mad this method non-volatile. We don't want this process to kill the service if it fails and we don't want to lose logging even these values are not found. The logs still have value even if not all identity values are found.

Copy link
Member

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, more spelling

/// <summary>
/// Provides read only access to the _ServiceTreeId
/// </summary>
public Guid ServiceTreeId { get { return _ServiceTreeId; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest removing _ServiceTreeId and

Suggested change
public Guid ServiceTreeId { get { return _ServiceTreeId; } }
public Guid ServiceTreeId { get; private set }

{
private ILogger ControlPlaneLogger;

private bool SuppressAuditLogging = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming schemes seem out of whack but that may be true in general for this repo. suggest

Suggested change
private bool SuppressAuditLogging = false;
private bool _suppressAuditLogging = false;

}
});

ControlPlaneLogger = auditFactory.CreateControlPlaneLogger();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this if _suppressAuditLogging is `true❓

{
try
{
LogSecretAction(OperationType.Update, operationName, credentialProvider, secretName, secretStoreType, secretLocation, result, resultMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not exit before the try block if _suppressAuditLogging is `true❓

var tenantId = credentialProvider.TenantId;
var auditRecord = new AuditRecord
{
OperationResultDescription = $"Action '{operationType}' For Secret '{secretName}' With Opeation '{operationName}' By User '{user}' On Source '{secretLocation}' Resulted In '{result}'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OperationResultDescription = $"Action '{operationType}' For Secret '{secretName}' With Opeation '{operationName}' By User '{user}' On Source '{secretLocation}' Resulted In '{result}'.",
OperationResultDescription = $"Action '{operationType}' For Secret '{secretName}' With Operation '{operationName}' By User '{user}' On Source '{secretLocation}' Resulted In '{result}'.",

auditRecord.AddCallerIdentity(CallerIdentityType.ApplicationID, user);
auditRecord.AddCallerIdentity(CallerIdentityType.TenantId, tenantId);
// This value is basically a hard coded 'guess'
// The access level is defiend by permission setting of the 'user'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The access level is defiend by permission setting of the 'user'
// The access level is defined by permission setting of the 'user'


ControlPlaneLogger.LogAudit(auditRecord);
}
internal static string GetLocalIPAddress()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are this and GetOperationAccessLevel(...) given internal access❓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants