-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update profiles.proto with suggestions from Elastic protocol #8
Conversation
|
@@ -182,6 +182,9 @@ message AttributeSet { | |||
// A stacktrace is a sequence of locations. Order of locations goes from callers to callees. Many stacktraces will point to the same locations. The link between stacktraces, attribute sets, links, values and timestamps is implicit and is based on the order of the elements in the corresponding tables in ProfileType message. | |||
message Stacktrace { | |||
repeated uint32 location_indices = 1; | |||
|
|||
// An 128bit id that uniquely identifies this stacktrace, globally. Index into string table. | |||
uint32 trace_id_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a well-specified and globally unique Stacktrace
id may help interoperability as Stacktraces can be linked/shared across vendors through their globally unique ids. The Stacktrace
id can be calculated by hashing, for every Location
in the Stacktrace
:
- The
file_id
corresponding toLocation.Mapping
- The address of the
Location
translated into a stable executable offset:Location.Mapping.file_offset + (Location.address - Location.Mapping.memory_start)
An id for mixed Stacktraces
can also be generated by special-casing Locations
containing non-"native" types (see: type_index
below):
- Instead of the
file_id
ofLocation.Mapping
, a "synthetic"file_id
can be generated by using a runtime-specific scheme (e.g. hash of the filename and class/method/function) - Instead of
Location.address
, we can use line numbers.
// A string that uniquely identifies a particular binary with high probability. | ||
// It is meant to work around deficiencies in the GNU Build ID that may compromise | ||
// uniqueness / correlatability. Index into string table. | ||
uint32 file_id_index = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GNU Build ID may be missing (Alpine) or not be usable for uniquely identifying binaries. It may also be set to random bytes which makes correlation across builds impossible as its value is not derived from the file.
An alternative is to fully specify a hashing scheme for a new file_id
that doesn't suffer from these issues. Thomas Dullien has suggested hashing the ELF header and a few deterministically selected executable pages. We could then either replace build_id_index
with file_id_index
, or if we still want to keep build_id_index
around with its original semantics (e.g. containing GNU Build ID), introduce a separate field as shown here.
@@ -274,6 +281,9 @@ message Location { | |||
|
|||
// Reference to an attribute set from the Profile's attribute set table. | |||
repeated uint32 attribute_set_indices = 5; | |||
|
|||
// Type of frame (e.g. kernel, native, python, hotspot, php). Index into string table. | |||
uint32 type_index = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be useful when profiling mixed workloads (e.g. Python calling into C). Allows distinguishing between different runtimes. It is not an enumerated type to allow for easy extensibility.
An alternative is to use an enumerated type with two possible values (native, non-native) in addition to type_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback from the committee meeting: A cheaper way of representing the type may be a special mapping type for each runtime?
e5c1324
to
b0ddf19
Compare
Summary
Discussion around this PR should focus mostly on semantics (e.g. does having these extra options as part of the base protocol promote interoperability?) rather than the specific fields added to the messages. I'll leave comments in a review.
Adds the following: