Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update profiles.proto with suggestions from Elastic protocol #8

Closed

Conversation

christos68k
Copy link
Member

@christos68k christos68k commented Oct 5, 2023

Summary

Discussion around this PR should focus mostly on semantics (e.g. does having these extra options as part of the base protocol promote interoperability?) rather than the specific fields added to the messages. I'll leave comments in a review.

Adds the following:

  1. A globally unique Stacktrace id
  2. A binary "build id" that doesn't suffer from GNU Build ID limitations
  3. A "frame type" field that can be used to distinguish between different runtimes in Stacktraces generated by mixed-workloads (e.g. Python calling into C).

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 5, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: christos68k / name: Christos Kalkanis (b0ddf19)

@@ -182,6 +182,9 @@ message AttributeSet {
// A stacktrace is a sequence of locations. Order of locations goes from callers to callees. Many stacktraces will point to the same locations. The link between stacktraces, attribute sets, links, values and timestamps is implicit and is based on the order of the elements in the corresponding tables in ProfileType message.
message Stacktrace {
repeated uint32 location_indices = 1;

// An 128bit id that uniquely identifies this stacktrace, globally. Index into string table.
uint32 trace_id_index;
Copy link
Member Author

@christos68k christos68k Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a well-specified and globally unique Stacktrace id may help interoperability as Stacktraces can be linked/shared across vendors through their globally unique ids. The Stacktrace id can be calculated by hashing, for every Location in the Stacktrace:

  1. The file_id corresponding to Location.Mapping
  2. The address of the Location translated into a stable executable offset: Location.Mapping.file_offset + (Location.address - Location.Mapping.memory_start)

An id for mixed Stacktraces can also be generated by special-casing Locations containing non-"native" types (see: type_index below):

  1. Instead of the file_id of Location.Mapping, a "synthetic" file_id can be generated by using a runtime-specific scheme (e.g. hash of the filename and class/method/function)
  2. Instead of Location.address, we can use line numbers.

// A string that uniquely identifies a particular binary with high probability.
// It is meant to work around deficiencies in the GNU Build ID that may compromise
// uniqueness / correlatability. Index into string table.
uint32 file_id_index = 6;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GNU Build ID may be missing (Alpine) or not be usable for uniquely identifying binaries. It may also be set to random bytes which makes correlation across builds impossible as its value is not derived from the file.

An alternative is to fully specify a hashing scheme for a new file_id that doesn't suffer from these issues. Thomas Dullien has suggested hashing the ELF header and a few deterministically selected executable pages. We could then either replace build_id_index with file_id_index, or if we still want to keep build_id_index around with its original semantics (e.g. containing GNU Build ID), introduce a separate field as shown here.

@@ -274,6 +281,9 @@ message Location {

// Reference to an attribute set from the Profile's attribute set table.
repeated uint32 attribute_set_indices = 5;

// Type of frame (e.g. kernel, native, python, hotspot, php). Index into string table.
uint32 type_index = 5;
Copy link
Member Author

@christos68k christos68k Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be useful when profiling mixed workloads (e.g. Python calling into C). Allows distinguishing between different runtimes. It is not an enumerated type to allow for easy extensibility.

An alternative is to use an enumerated type with two possible values (native, non-native) in addition to type_index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from the committee meeting: A cheaper way of representing the type may be a special mapping type for each runtime?

petethepig added a commit to petethepig/opentelemetry-proto that referenced this pull request Oct 19, 2023
petethepig added a commit to petethepig/opentelemetry-proto that referenced this pull request Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants