Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MP70: Implement MicroProfile Telemetry 2.0 #27108

Open
34 of 56 tasks
Emily-Jiang opened this issue Dec 7, 2023 · 16 comments
Open
34 of 56 tasks

MP70: Implement MicroProfile Telemetry 2.0 #27108

Emily-Jiang opened this issue Dec 7, 2023 · 16 comments
Assignees
Labels
Design Approved Epic Used to track Feature Epics that are following the UFO process focalApproved:demo Approval that a Demo has been scheduled focalApproved:externals Focal Approval granted for APIs/Externals for the feature focalApproved:id Focal Approval granted for ID for the feature focalApproved:instantOn Focal Approval granted for InstantOn for the feature focalApproved:performance Focal Approval granted for Performance for the feature focalApproved:serviceability Focal Approval granted for Serviceability for the feature focalApproved:ste Focal Approval granted for STE for the feature focalApproved:svt Focal Approval granted for SVT for the feature GA Approved Without All Focal Approvals For use by Chief Architect or designated delegate ID Required in:MicroProfile/Telemetry In Progress Items that are in active development. release:24009 target:beta The Epic or Issue is targetted for the next beta target:24008-beta team:MicroProfileUK

Comments

@Emily-Jiang
Copy link
Member

Emily-Jiang commented Dec 7, 2023

Description

Adopt OpenTelemetry Metrics and/or Logging as well as Tracing.

Dependent epics:


Documents

When available, add links to required feature documents. Use "N/A" to mark particular documents which are not required by the feature.


Process Overview

General Instructions

The process steps occur roughly in the order as presented. Process steps occasionally overlap.

Each process step has a number of tasks which must be completed or must be marked as not applicable ("N/A").

Unless otherwise indicated, the tasks are the responsibility of the Feature Owner or a Delegate of the Feature Owner.

If you need assistance, reach out to the OpenLiberty/release-architect.

Important: Labels are used to trigger particular steps and must be added as indicated.


Prioritization (Complete Before Development Starts)

The (OpenLiberty/chief-architect) and area leads are responsible for prioritizing the features and determining which features are being actively worked on.

Prioritization

  • Feature added to the "New" column of the Open Liberty project board
    • Epics can be added to the board in one of two ways:
      • From this issue, use the "Projects" section to select the appropriate project board.
      • From the appropriate project board click "Add card" and select your Feature Epic issue
  • Priority assigned
    • Attend the Liberty Backlog Prioritization meeting

Design (Complete Before Development Starts)

Design preliminaries determine whether a formal design, which will be provided by an Upcoming Feature Overview (UFO) document, must be created and reviewed. A formal design is required if the feature requires any of the following: UI, Serviceability, SVT, Performance testing, or non-trivial documentation/ID. Furthermore, each identified item places a blocking requirement on another team so it must be identified early in the process. The feature owner may check-off the item if they know it doesn't apply, but otherwise they should work with the focal point to determine what work, if any, will be necessary and make them aware of it.

Design Preliminaries

  • UI requirements identified, or N/A. (Feature owner and UI focal point)
  • Accessibility requirements identified, or N/A. (Feature owner and Accessibility focal point)
  • ID requirements identified, or N/A. (Feature owner and ID focal point)
    • Refer to Documenting Open Liberty.
    • Feature Owner adds label ID Required, if non-trivial documentation needs to be created by the ID team.
    • ID adds label ID Required - Trivial, if no design will be performed and only trivial ID updates are needed.
  • Serviceability requirements identified, or N/A. (Feature owner and Serviceability focal point)
  • SVT requirements identified, or N/A. (Feature owner and SVT focal point)
  • Performance testing requirements identified, or N/A. (Feature owner and Performance focal point)

Design

  • POC Design / UFO review requested.
    • Feature owner adds label Design Review Request
  • POC Design / UFO review scheduled.
    • Follow the instructions in POC-Forum repo
  • POC Design / UFO review completed.
  • POC / UFO Review follow-ons completed.
  • POC Design / UFO approval requested.
    • Feature owner adds label Design Approval Request
  • Design / UFO approved. (OpenLiberty/chief-architect) or N/A
    • (OpenLiberty/chief-architect) adds label Design Approved
    • Add the public link to the UFO in Box to the Documents section.
    • The UFO must always accurately reflect the final implementation of the feature. Any changes must be first approved. Afterwards, update the UFO by creating a copy of the original approved slide(s) at the end of the deck and prepend "OLD" to the title(s). A single updated copy of the slide(s) should take the original's place, and have its title(s) prepended with "UPDATED".

No Design

  • No Design requested.
    • Feature owner adds label No Design Approval Request
  • No Design / No UFO approved. (OpenLiberty/chief-architect) or N/A
    • Approver adds label No Design Approved
  • Feature / Capability stabilization or discontinuation or N/A
    • Feature owner adds label Product Management Approval Request and notifies OpenLiberty/product-management
    • Approver adds label Product Management Approved (OpenLiberty/product-management)
    • Note: For stabilized, superseded, and discontinued feature/capability, skip the Beta section of the template (you may delete it). Otherwise, proceed as normal.

FAT Documentation


Implementation

A feature must be prioritized before any implementation work may begin to be delivered (inaccessible/no-ship). However, a design focused approach should still be applied to features, and developers should think about the feature design prior to writing and delivering any code.
Besides being prioritized, a feature must also be socialized (or No Design Approved) before any beta code may be delivered. All new Liberty content must be inaccessible in our GA releases until it is Feature Complete by either marking it kind=noship or beta fencing it.
Code may not GA until this feature has obtained the Design Approved or No Design Approved label, along with all other tasks outlined in the GA section.

Feature Development Begins

  • Add the In Progress label

Legal and Translation

In order to avoid last minute blockers and significant disruptions to the feature, the legal items need to be done as early in the feature process as possible, either in design or as early into the development as possible. Similarly, translation is to be done concurrently with development. Both MUST be completed before Beta or GA is requested.

Legal (Complete before Feature Complete Date)

  • Changed or new open source libraries are cleared and approved, or N/A. (Legal Release Services/Cass Tucker/Release PM).

Innovation (Complete 1 week before Feature Complete Date)

  • Consider whether any aspects of the feature may be patentable. If any identified, disclosures have been submitted.

Translation (Complete by Feature Complete Date)

  • PII (Program Integrated Information) updates are merged (i.e. all English strings due for translation have been delivered), or N/A.

Beta

In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.

Beta Code

  • Beta fence the functionality
    • E.g. kind=beta, ibm:beta, ProductInfo.getBetaEdition()
  • Beta development complete and feature ready for inclusion in a beta release
    • Add label target:beta and the appropriate target:YY00X-beta (where YY00X is the targeted beta version).
  • Feature delivered into beta

Beta Blog (Complete by beta eGA)

  • Beta blog issue created and populated using the Open Liberty BETA blog post template.
    • Add a link to the beta blog issue in the Documents section.
    • Note: This is for inclusion into the overall beta release blog post. If, in addition, you'd also like to create a dedicated blog post about your feature, then follow the "Standalone Feature Blog Post" instructions under the Other Deliverables section.

GA

A feature is ready to GA after it is Feature Complete and has obtained all necessary Focal Point Approvals.

Feature Complete

  • Feature implementation and tests completed.
    • All PRs are merged.
    • All related/child issues are closed.
    • All stop ship issues are completed.
  • Legal: all necessary approvals granted.
  • Translation: Feature may only proceed to GA if it has either Translation - Complete or Translation - Missing label
    • If all translation has been delivered to release branch, feature owner adds label Translation - Complete.
    • If missing translation does not cause a break in functionality, nor a security or production outage risk, feature owner adds label Translation - Missing.
      • Once all missing translations are delivered, the Translation - Missing label is replaced with Translation - Complete.
    • If missing translation could cause a break in functionality or a security or production outage risk, feature owner adds the Translation - Blocked label.
      • Featues with Translation - Blocked may NOT proceed to GA until the label has been replaced with either Translation - Missing or Translation - Complete.
    • For further guidance, contact Globalization focal point or the Release Architect.
  • GA development complete and feature ready for inclusion in a GA release
    • Add label target:ga and the appropriate target:YY00X (where YY00X is the targeted GA version).
    • Inclusion in a release requires the completion of all Focal Point Approvals.

Focal Point Approvals (Complete by Feature Complete Date)

These occur only after GA of this feature is requested (by adding a target:ga label). GA of this feature may not occur until all approvals are obtained.

All Features

  • APIs/Externals - Externals have been reviewed or N/A. (OpenLiberty/externals-approvers)
    • Approver adds label focalApproved:externals
  • Demo - Demo is scheduled for an upcoming EOI or N/A. (OpenLiberty/demo-approvers)
    • Add comment @OpenLiberty/demo-approvers Demo scheduled for EOI [Iteration Number] to this issue.
    • Approver adds label focalApproved:demo.
  • FAT - All Tests complete and running successfully in SOE or N/A. (OpenLiberty/fat-approvers)
    • Approver adds label focalApproved:fat.

Design Approved Features

  • ID - Documentation is complete or N/A. (OpenLiberty/id-approvers)
    • Approver adds label focalApproved:id.
    • NOTE: If only trivial documentation changes are required, you may reach out to the ID Feature Focal to request a ID Required - Trivial label. Unlike features with regular ID requirement, those with ID Required - Trivial label do not have a hard requirement for a Design/UFO.

  • InstantOn - InstantOn capable or N/A. (OpenLiberty/instantOn-approvers)
    • Approver adds label focalApproved:instantOn.
  • Performance - Performance testing is complete or N/A. (OpenLiberty/performance-approvers)
    • Approver adds label focalApproved:performance.
  • Serviceability - Serviceability has been addressed or N/A. (OpenLiberty/serviceability-approvers)
    • Approver adds label focalApproved:sve.
  • STE - Skills Transfer Education chart deck is complete or N/A. (OpenLiberty/ste-approvers)
    • Approver adds label focalApproved:ste.
  • SVT - System Verification Test is complete or N/A. (OpenLiberty/svt-approvers)
    • Approver adds label focalApproved:svt.

Remove Beta Fencing (Complete by Feature Complete Date)

  • Beta guards are removed, or N/A
    • Only after all necessary Focal Point Approvals have been granted.

GA Blog (Complete by Friday after GM)

  • GA Blog issue created and populated using the Open Liberty GA release blog post template.
    • Add a link to the GA Blog issue in the Documents section.
    • Note: This is for inclusion into the overall release blog post. If, in addition, you'd also like to create a dedicated blog post about your feature, then follow the "Standalone Feature Blog Post" instructions under the Other Deliverables section.

Post GM (Complete before GA)

  • After confirming this feature has been included in the GM driver, feature owner closes this issue.

Post GA


Other Deliverables


@Emily-Jiang Emily-Jiang added the Epic Used to track Feature Epics that are following the UFO process label Dec 7, 2023
@Emily-Jiang Emily-Jiang changed the title MicroProfile Telemetry 3.0 Implement MicroProfile Telemetry 3.0 Dec 7, 2023
@Emily-Jiang Emily-Jiang changed the title Implement MicroProfile Telemetry 3.0 Implement MicroProfile Telemetry 2.0 Dec 7, 2023
@tevans78 tevans78 changed the title Implement MicroProfile Telemetry 2.0 MP70: Implement MicroProfile Telemetry 2.0 Jan 10, 2024
@yasmin-aumeeruddy yasmin-aumeeruddy added the In Progress Items that are in active development. label Jan 12, 2024
@tevans78 tevans78 added target:beta The Epic or Issue is targetted for the next beta target:24007-beta labels Jun 3, 2024
@donbourne
Copy link
Member

we need to add resource attribute labels to properly identify where our telemetry is coming from. I've opened #28608 for that, and added that to a checklist in the description of this epic.

@donbourne
Copy link
Member

I added another requirement to this epic to ensure that we add required JVM metrics. See checklist in description of this epic.

@donbourne
Copy link
Member

I added another requirement to this epic to be able to load customizers into the OTel SDK instance created by the runtime. See checklist in description of this epic.

@njr-11
Copy link
Contributor

njr-11 commented Jun 21, 2024

Review comments from Upcoming Feature Overview presentation:

  • slides 5 and 9: update current version of Open Telemetry to 1.39
  • slide 10: Clarify the collection of Liberty logs from before MPTelemetry enables.
  • slide 10: Clarify distinction between what is covered in Epic 27711 vs this Epic.
  • slide 10: Terminology around mpMetrics-5.1 caused confusion around which feature is recording information
  • slide 12: Highlight which portion of slide is new.
  • slide 14: MicroProfile Metrics guide should be omitted. Might need further discussion how to cover in a guide.
  • slide 15: Check if the internal package can be omitted from API or hidden
  • slide 20: Is configuration in server.xml as supported?
  • slide 25: InstantOn concerns with system property level configuration, with the possibility it isn’t readable at checkpoint time and different at checkpoint restore.
  • slide 25: Clarify what happens to recorded logs when checkpoint is involved. Consensus seemed to be to not include logs in the checkpoint.
  • slides 27, 28: include note on what highlighting means
  • slide 28: include Open Source License type and IBM contributors
  • slide 28: Need to include io.opentelemetry.instrumentation:opentelemetry-runtime-telemetry-java8:2.4.0-alpha
    API/SPI slide: do all API need to remain third-party or can some be promoted to stable API?
  • slide 30: Include InstantOn testing
  • slide 31: SVT brought up that it might not be possible to test with Zipkin.
  • slide 32: Performance - possible overhead from enabling monitor-1.0
  • slide 33: explain reasoning behind version combinations
  • slide 33: include supported Java level minimums
  • slide 34: Security slide: switch N/A to state it is unchanged and brief summary.
  • slide 35: Serviceability: switch N/A to mention having a warning message for conflict between application-level config and system-level config. Also a message to state whether it is enabled or disabled, and possibly what it is exporting.
  • slide 35: Log message if someone is configuring in application only.
  • Overall comment: include using bells to load customizations at server level

@NottyCode
Copy link
Member

@yasmin-aumeeruddy I'm really struggling to understand slides 19-21 as shown in the UFO. I spoke to @donbourne because it is late on a Friday for you as I'm reviewing it and he might be in a better position to explain my confusion. It isn't clear to me what the pictures are trying to show, the boxes are labeled host, but look like servers. On slide 21 it isn't clear what the lines between the two hosts mean, and it isn't clear to me how you get a runtime associated SDK vs an application associated one.

@yasmin-aumeeruddy
Copy link
Member

yasmin-aumeeruddy commented Jul 30, 2024

Hi @NottyCode

I have changed the reference from "Host" to server.
A runtime associated Opentelemetry SDK instance is used if otel.sdk.disabled=false is set with an environment variable or Java system property. All applications in that server share this instance.

If otel.sdk.disabled=false is set with any other configuration, an application associated OpenTelemetry SDK instance is used.

I have updated the diagrams in the UFO with the help of @donbourne

@Azquelt
Copy link
Member

Azquelt commented Aug 12, 2024

On slide 36:

A warning message will be shown if only the application is configured:
• MicroProfile Telemetry is configured by the application instead of the server. Consequently, runtime metrics and logs are disabled.

As I understand it, it's valid and may be desirable to configure MP Telemetry per application rather than per server?

If so, I think this should be an info message, rather than a warning. Warnings are for things that are wrong that the user ought to correct. I think we've had users complain before about warnings that they can't correct.

@abutch3r abutch3r added the focalApproved:demo Approval that a Demo has been scheduled label Aug 14, 2024
@yasmin-aumeeruddy
Copy link
Member

@OpenLiberty/demo-approvers Demo scheduled for EOI [24.17]

@donbourne
Copy link
Member

OL:

Serviceability Approval Comment - Please answer the following questions for serviceability approval:

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

  2. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team).
    a) What problem paths were tested and demonstrated?
    b) Who did you demo to?
    c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

  3. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered.
    a) Who conducted SVT tests for this feature?
    b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

  4. Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.

  5. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

@cbridgha cbridgha added the focalApproved:externals Focal Approval granted for APIs/Externals for the feature label Aug 19, 2024
@chirp1
Copy link
Contributor

chirp1 commented Aug 19, 2024

Approving. David Mueller indicated that he has/will have the info that he needs to make the doc updates.

@chirp1 chirp1 added the focalApproved:id Focal Approval granted for ID for the feature label Aug 19, 2024
@donbourne
Copy link
Member

@clarkek123 will be handling the serviceability approval for this epic.

@jdmcclur jdmcclur added the focalApproved:performance Focal Approval granted for Performance for the feature label Aug 20, 2024
@yasmin-aumeeruddy
Copy link
Member

yasmin-aumeeruddy commented Aug 21, 2024

@clarkek123

UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?

Yes - Slide 36 shows the new warning messages that are emitted when enabling the feature if the user has any misconfiguration. The breaking changes to the configuration of the feature are explained in slides 19-21.

Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. IBM Support, test team, or another development team).
a) What problem paths were tested and demonstrated?

Behaviour: Enabling the OpenTelemetry sdk by setting otel.sdk.disabled=false with environment variables/system properties then disabling it with otel.sdk.disabled=true with other configuration.

Outcome: A warning message is shown:
CWMOT5006W: Conflicting configuration for the otel.sdk.disabled configuration property detected for the _application name_ application. The final value is otel.sdk.disabled=false. Telemetry cannot be disabled for an application when it is enabled for the runtime.

Behaviour: Disabling the OpenTelemetry sdk by setting otel.sdk.disabled=true with environment variables/system properties then enabling it with otel.sdk.disabled=false with other configuration.

Outcome: A warning message is shown:
WMOT5007W: Conflicting configuration for the otel.sdk.disabled configuration property detected for the _application name_ application. The final value is otel.sdk.disabled=false because the property enabling telemetry for the application overrides the property disabling telemetry for the runtime.

b) Who did you demo to?

The UK MicroProfile Development team - Including team members who were not involved in the development process: @abutch3r @jakub-pomykala @tevans78 @ryan-storey. @benjamin-confino was involved in the development process but reviewed as a level 3 support engineer.

c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that IBM Support should be able to quickly address those problems without need to engage SMEs?**

Yes

SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered.
a) Who conducted SVT tests for this feature?
b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that IBM Support should be able to quickly address those problems without need to engage SMEs?

According to @hanczaryk: SVT agrees that the serviceability of any problem encountered was sufficient to avoid PMRs or L2 should be able to quickly address those problems without engaging L3.

Which IBM Support / SME queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective IBM Support/SME teams know they are supporting it. Ask Don Bourne if you need links or more info.

Yes, see was-l3-cdi

Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

Yes- The link to documentation is here:
OpenLiberty/docs#7470

@clarkek123 clarkek123 added the focalApproved:serviceability Focal Approval granted for Serviceability for the feature label Aug 21, 2024
@clarkek123
Copy link
Member

I have added Serviceability approval based on the information provided above for Serviceability showing common error paths testing with demo with approval from Local team members with L3 support focus and others not working on development in addition to the SVT signoff on the paths included for serviceability.

@yasmin-aumeeruddy
Copy link
Member

@OpenLiberty/ste-approvers I have added the slide deck to the box folder

@tjwatson tjwatson added the focalApproved:instantOn Focal Approval granted for InstantOn for the feature label Aug 21, 2024
@tngiang73
Copy link

@yasmin-aumeeruddy : WASWIN is good with the STE slides. STE approved.

@tngiang73 tngiang73 added the focalApproved:ste Focal Approval granted for STE for the feature label Aug 21, 2024
@hanczaryk hanczaryk added the focalApproved:svt Focal Approval granted for SVT for the feature label Aug 21, 2024
@NottyCode NottyCode added the GA Approved Without All Focal Approvals For use by Chief Architect or designated delegate label Aug 26, 2024
@NottyCode
Copy link
Member

Approved for GA without the FAT approval. In discussions with the test team it seems that the tests are timing out which means that sometimes it doesn't run on some platforms. When it runs it seems to pass. We obviously need the tests to be passing reliably, but the assessment is that the risk the timeout is hiding quality issues is low. Addressing the test reliability will need to be done post GA.

@LifeIsGood524 LifeIsGood524 added release:24009 and removed target:ga The Epic is ready for focal approvals, after which it can GA. target:24009 labels Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Approved Epic Used to track Feature Epics that are following the UFO process focalApproved:demo Approval that a Demo has been scheduled focalApproved:externals Focal Approval granted for APIs/Externals for the feature focalApproved:id Focal Approval granted for ID for the feature focalApproved:instantOn Focal Approval granted for InstantOn for the feature focalApproved:performance Focal Approval granted for Performance for the feature focalApproved:serviceability Focal Approval granted for Serviceability for the feature focalApproved:ste Focal Approval granted for STE for the feature focalApproved:svt Focal Approval granted for SVT for the feature GA Approved Without All Focal Approvals For use by Chief Architect or designated delegate ID Required in:MicroProfile/Telemetry In Progress Items that are in active development. release:24009 target:beta The Epic or Issue is targetted for the next beta target:24008-beta team:MicroProfileUK
Projects
Status: Epics in progress
Development

No branches or pull requests