-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add telemetry blog post #395
base: gh-pages
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
--- | ||
layout: blog | ||
title: Automatic Bug Reporting | ||
author: Alexander Kyte | ||
tags: [logging] | ||
--- | ||
|
||
## Automatic Bug Reporting ## | ||
|
||
Software engineers often like to think of exceptional paths as being rarely taken. | ||
While this is hopefully the case on a customer’s machine, the engineer will see a | ||
program fail far more often than they will watch it succeed. Every engineer who has | ||
had to suffer bad tooling of one form or another becomes aware of this fact like a | ||
person with a broken foot becomes aware of how far away things are. | ||
|
||
The modern software lifecycle does not end when you ship a piece of code. Defects in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Modern software lifecycle... |
||
shipped software are reported, studied, reproduced, and then debugged. A significant | ||
portion of the time it takes to fix a bug can be spent in discovering that it exists. | ||
Often this discovery includes finding the ways that the customer’s environment differs | ||
from the developer’s testing environment. A back-and-forth conversation can give a | ||
developer a lot of information, but it’s not the case that every bug filer is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...conversation can give a developer a lot of information |
||
motivated enough to keep returning and responding. | ||
|
||
### Trust and Privacy ### | ||
|
||
A desire to automate the process of finding out what made a bit of code crash has lead | ||
to a lot of innovations in software engineering tooling. Unfortunately, many attempts to | ||
address this problem have completely lost the trust of the customer base. As soon as a | ||
brand is associated with “tracking,” people stop affording the company the benefit of the | ||
doubt. Telemetry mistakes seeming sinister (and sometimes correctly so) has lead to the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd stay away from such politically charged tone here. |
||
passing of the GDPR in Europe. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. define the acronym as |
||
|
||
With all of this in mind, a team attempting to solve the telemetry problem is faced with | ||
sweaty palms. The addition of integrated telemetry support to the Mono runtime is something | ||
that would have to balance a number of concerns. | ||
|
||
I believe that we did pretty well. Any constructive criticism through official channels is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid the self congratulatory tone. You can say the same without using it. |
||
appreciated. I will describe now the balance that we struck. | ||
|
||
### Domain Details ### | ||
|
||
Any telemetry system can be separated into a number of components. | ||
|
||
1. That which collects information about runtime state during the crash | ||
2. That which moves it from the customer machine to the developer machine | ||
3. The manipulation and aggregation systems for the crashes. | ||
|
||
Now the concerns in the various parts are rather contradictory. Part 3 should be private, | ||
as information about bugs may pose security risks. Making Part 3 private often means that making | ||
Part 2 proprietary and closed-source is important. I don’t think that many people disagree with | ||
these two points. The information being sent (and how it was collected in Part 1) is the part that is | ||
subject to the most scrutiny. | ||
|
||
All of the source code that creates the files that are created during a crash to control Part 2 | ||
are open source and completely audit-able. Folks can play along at home | ||
(source is [here](https://github.com/mono/mono/blob/5672eba58212345b8e9722587533c325a0c5825d/mono/utils/mono-state.c)) | ||
as they continue to read, if they would like to confirm what I am saying. | ||
|
||
### Implementation ### | ||
|
||
Being privacy-preserving is more than just having our policies in open-source code; we must have | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This phrase is confusing. How about:
We don't need to drop the GDPR hammer all over the place, it distracts people making them thing it's something to pay attention while reading your post. |
||
good policies. To abide by the GDPR we cannot collect any Personally Identifying Information. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
What is PII for a bug report? Most of the choices were very straightforward, but a few required | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You propose the question of what's PII, but don't answer it. The part about not collecting file paths and code in stack traces are awesome examples of doing so.
|
||
careful work to preserve our desired behavior. Beyond avoiding sending file paths from a user’s | ||
machine or data from their code, we also cannot capture their code in our stack trace. If a | ||
Visual Studio For Mac extension contains code that is not ours, we should not collect information | ||
about their crashes. It is simple enough to only report the UUID, token, and managed IL offset | ||
(CIL metadata about the class) that all work together like a unique hash. Without the C# assembly | ||
file in question, you don’t know which hash goes to which file. It’s a primary key, but the | ||
people who have the mapping already have the private data. | ||
|
||
We needed to balance that desire against the desire for a crash from two different versions of mono | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What desire? |
||
to look very similar or identical to the backend (Part 3). Mono already has unique hashing functions | ||
for metadata objects; we use it inside of the AOT compiler and runtime. We can then generate a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
hash that is identical for two identical stack traces, while uploading a version-dependent unique | ||
reference with the main stack trace. What this means is that if two crashes have the same hash but | ||
different information otherwise, they are the same crash. This preserves privacy while letting you | ||
count how often each bug is hit. | ||
|
||
To make this concrete, this is an example of what we send for a managed frame: | ||
|
||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add |
||
{ | ||
"is_managed" : "true", | ||
"guid" : "0845998F-6B70-4AA8-9214-6731378926A0", | ||
"token" : "0x6003817", | ||
"native_offset" : "0x1fd", | ||
"il_offset" : "0x00071" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. indent these so they're not on the same column as the curly braces There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be nice to add some comments about what each field means, e.g.
|
||
} | ||
``` | ||
|
||
### Bigger Impacts ### | ||
|
||
This crash dump creation also represented a change in our philosophy with respect to error reporting. | ||
Rather than expecting the bug report submission to be the beginning of an ongoing conversation, | ||
we are now expecting it to be an anonymous message. This motivates our embedding information into the | ||
dump that we would typically ask someone to collect for us by exporting an environment variable or | ||
attaching with a debugger. The dump needs to contain enough system clues without describing the | ||
embedding application too closely and being creepy. | ||
|
||
It is a much harder problem for a runtime than for a web server because we expose a lot of the | ||
details of the underlying platform to a developer who chooses to poke and prod. The CPU model isn’t | ||
likely to cause problems with a web server, but leads to | ||
[impossible bugs](https://www.mono-project.com/news/2016/09/12/arm64-icache/) with mono. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change the link to |
||
We expose these abstract state machines in the API that only roughly correlate to the bits in the machine registers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we expose these abstract state machines? It's not clear what's the connection with the previous paragraph. |
||
This train of thought with respect to state machines and logging lead to the flight recorders, which will be mentioned in a subsequent blog post. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we do the flight recorder post before this one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's hard to make the connection of state machines, logging and flight recorders.
|
||
|
||
The best part of all of this is that it is all open source. Because mono is an embedded runtime, when you embed our telemetry engine you gain the ability to collect telemetry on your own code. Someone today can build mono in a way that allows them to get a beautiful runtime state dump on each crash. If they don’t change too much, it’ll even be GDPR-compliant. It can be hard to get really excited about logging, but it’s easy to get excited about spending less time teaching customers to debug. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should probably expand a bit on how this is explicitly opt-in for embedders and how we're only using it in VSMac today and not on any random user binaries that are executed with mono-project.com packages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we must make it clear that this is completely opt-in and the place we enabled has user consent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure about the dismissive tone of |
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every
one
engineer who has to sufferfrom
...