Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type validation in configstore DC client value updating #5110

Conversation

charlese-instaclustr
Copy link
Contributor

Fixed a misleading check that was allowing config_store_client#UpdateValue to return no error, despite actually failing the operation and corrupting the dynamic config storage in the process.

Added a more verbose log message when the type check fails, to aid users in understanding the issue.

Why?

This is a temporary change mitigating a more widespread issue described in #5109 -- in summary, inconsistent assumptions about input data types across different implementations of the functions outlined in the Dynamicconfig.Client interface lead to instances in the codebase where the success / failure of calls to a dynamic config client depends on its specific implementation.

By making this change, the config_store_client will no longer attempt to update its values when itsUpdateValue implementation is passed data in the format currently expected by the file_based_config client. This is an improvement because it will stop users inadvertently corrupting their dynamic config, and by consulting the updated log message, they will be informed that they can update their config via the CL instead.

However, long-term, a priority should be addressing issue #5109 and eliminating the inconsistency in this DC client interface.

How did you test it?

Tested locally.

  • Ran Advanced Visibility Cadence server with the config store dynamic config client enabled (config available in development.yaml)
  • Added an initial value to the dynamic config for frontend.validSearchAttributes in the CLI: cadence admin config updc --dynamic_config_name frontend.validSearchAttributes --dynamic_config_value '{"Value": '{"testAttribute": 2}',"Filters":[]}'
  • Verified that the config value has been added: cadence admin config listdc
  • Triggered the AddSearchAttribute handler: cadence admin cluster asa --search_attr_key testAttribute2 --search_attr_type 2. Note that this handler calls the UpdateValue function with the problematic input data format.
  • Reran: cadence admin config listdc. The prior set value for frontend.validSearchAttributes remains the same; by constrast, in the current master implementation, the prior set value would be gone.
  • Checked the Cadence logs, and confirmed that the updated log message shows.

Potential risks

No notable risks from this change; we are simply fixing a type validation so that it serves its intended purpose.

Release notes

Not notable.

Documentation Changes

N/A

@charlese-instaclustr charlese-instaclustr changed the title Remove misleading type check, Add more detailed log message Fix type validation in configstore DC client value updating Feb 21, 2023
@allenchen2244
Copy link
Contributor

@charlese-instaclustr maybe test is failing somehow?

…:charlese-instaclustr/cadence into remove-misleading-updatevalue-type-check

Merging in upstream sync with master
@charlese-instaclustr
Copy link
Contributor Author

@allenchen2244 good pickup, apologies, the tests picked up an edge case where the update value is nil. Added handling for the edge case to my change if you could rereview please. Thanks

@charlese-instaclustr
Copy link
Contributor Author

Hi @allenchen2244 , would you be able to give me a hand by rerunning the build and tests please? Thanks

@Shaddoll Shaddoll enabled auto-merge (squash) March 17, 2023 21:49
@coveralls
Copy link

Pull Request Test Coverage Report for Build 0186f18b-64da-446c-b19d-eadac9691f7c

  • 2 of 2 (100.0%) changed or added relevant lines in 2 files are covered.
  • 9 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.03%) to 57.109%

Files with Coverage Reduction New Missed Lines %
common/task/fifoTaskScheduler.go 2 84.54%
common/task/weightedRoundRobinTaskScheduler.go 2 88.6%
common/util.go 2 52.31%
service/history/execution/mutable_state_task_refresher.go 3 56.65%
Totals Coverage Status
Change from base Build 0186f0f3-f0db-4e0e-ab38-a4ae3852f931: 0.03%
Covered Lines: 85273
Relevant Lines: 149317

💛 - Coveralls

@Shaddoll Shaddoll merged commit 1519ace into uber:master Mar 17, 2023
davidporter-id-au added a commit that referenced this pull request Mar 18, 2023
commit f1e2476
Author: sonpham96 <sonpham1996@gmail.com>
Date:   Sat Mar 18 05:32:01 2023 +0700

    Upgrade Golang base image to 1.18 to remediate CVEs (#5035)

    Co-authored-by: David Porter <david.porter@uber.com>

commit 1519ace
Author: charlese-instaclustr <76502507+charlese-instaclustr@users.noreply.github.com>
Date:   Fri Mar 17 22:11:27 2023 +0000

    Fix type validation in configstore DC client value updating (#5110)

    * Remove misleading type check, Add more detailed log message

    * removing debugging logging

    * Handle nil update edge case

    ---------

    Co-authored-by: allenchen2244 <102192478+allenchen2244@users.noreply.github.com>
    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit a3e2774
Author: charlese-instaclustr <76502507+charlese-instaclustr@users.noreply.github.com>
Date:   Fri Mar 17 19:02:40 2023 +0000

    Add Canary TLS support (#5086)

    * add support for TLS connections by Canary, add development config for Canary with TLS

    * update README to include new config option

    * remove testing config

    ---------

    Co-authored-by: David Porter <david.porter@uber.com>
    Co-authored-by: Shijie Sheng <shengs@uber.com>
    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit ff4eab2
Author: Shijie Sheng <shengs@uber.com>
Date:   Thu Mar 16 20:10:54 2023 -0700

    [history] more cautious in deciding domain state to make decisions on dropping queued tasks (#5164)

    What changed?

    When domain cache returned entity not found error, don't drop queued tasks to be more conservative.

    Why?

    In cases when the cache is dubious, we shouldn't drop the queued tasks.

commit 55a8d93
Author: neil-xie <104041627+neil-xie@users.noreply.github.com>
Date:   Thu Mar 16 14:18:35 2023 -0700

    Add Pinot docker files, table config and schema (#5163)

    * Initial checkin for pinot config files

commit 1304570
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Thu Mar 16 15:20:29 2023 +0200

    Set poll interval for filebased dynamic config if not set (#5160)

    * Set poll interval for filebased dynamic config if not set

    * update unit test

commit 42a14b1
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Thu Mar 16 10:49:21 2023 +0200

    Elasticsearch: reduce code duplication (#5137)

    * Elasticsearch: reduce code duplication

    * address comments

    ---------

    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit cbf0d14
Author: bowen xiao <xbowen@uber.com>
Date:   Wed Mar 15 10:19:34 2023 -0700

    fix samples documentation (#5088)

commit ba19a29
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Wed Mar 15 12:52:29 2023 +0200

    Add ShardID to valid attributes (#5161)

commit a25cba8
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Wed Mar 15 10:56:50 2023 +0200

    ES: single interface for different ES/OpenSearch versions (#5158)

    * ES: single interface for different ES/OpenSearch versions

    * make fmt

commit e3ac246
Author: Ketsia <115650494+ketsiambaku@users.noreply.github.com>
Date:   Tue Mar 14 12:47:40 2023 -0700

    added logging with workflow/domain tags (#5159)

commit 9581488
Author: Ketsia <115650494+ketsiambaku@users.noreply.github.com>
Date:   Mon Mar 13 16:56:45 2023 -0700

    Consistent query pershard metric (#5143)

    * added and update consistent query per shard metric

    * testing pershard metric

    * move sample logger into persistence metric client for cleaness

    * fix test

    * fix lint

    * fix test again

    * fix lint

    * sample logging with workflowid tag

    * added domain tag to logger

    * metric completed

    * addressing comments

    * fix lint

    * Revert "fix lint"

    This reverts commit 1e96944.

    * fix lint second attempt

    ---------

    Co-authored-by: Allen Chen <allenchen2244@uber.com>
davidporter-id-au added a commit that referenced this pull request Mar 30, 2023
commit 9d01035
Author: allenchen2244 <102192478+allenchen2244@users.noreply.github.com>
Date:   Wed Mar 29 20:50:38 2023 -0700

    large workflow hot shard detection (#5166)

    Metrics for large workflows

commit dd51c53
Author: David Porter <david.porter@uber.com>
Date:   Wed Mar 29 18:30:06 2023 -0700

    fix build (#5180)

commit 7b281c2
Author: David Porter <david.porter@uber.com>
Date:   Mon Mar 27 10:38:37 2023 -0700

    Adds a small test to catch issues with deadlocks (#5171)

    * Adds a small test to catch issues with deadlocks

commit f1e2476
Author: sonpham96 <sonpham1996@gmail.com>
Date:   Sat Mar 18 05:32:01 2023 +0700

    Upgrade Golang base image to 1.18 to remediate CVEs (#5035)

    Co-authored-by: David Porter <david.porter@uber.com>

commit 1519ace
Author: charlese-instaclustr <76502507+charlese-instaclustr@users.noreply.github.com>
Date:   Fri Mar 17 22:11:27 2023 +0000

    Fix type validation in configstore DC client value updating (#5110)

    * Remove misleading type check, Add more detailed log message

    * removing debugging logging

    * Handle nil update edge case

    ---------

    Co-authored-by: allenchen2244 <102192478+allenchen2244@users.noreply.github.com>
    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit a3e2774
Author: charlese-instaclustr <76502507+charlese-instaclustr@users.noreply.github.com>
Date:   Fri Mar 17 19:02:40 2023 +0000

    Add Canary TLS support (#5086)

    * add support for TLS connections by Canary, add development config for Canary with TLS

    * update README to include new config option

    * remove testing config

    ---------

    Co-authored-by: David Porter <david.porter@uber.com>
    Co-authored-by: Shijie Sheng <shengs@uber.com>
    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit ff4eab2
Author: Shijie Sheng <shengs@uber.com>
Date:   Thu Mar 16 20:10:54 2023 -0700

    [history] more cautious in deciding domain state to make decisions on dropping queued tasks (#5164)

    What changed?

    When domain cache returned entity not found error, don't drop queued tasks to be more conservative.

    Why?

    In cases when the cache is dubious, we shouldn't drop the queued tasks.

commit 55a8d93
Author: neil-xie <104041627+neil-xie@users.noreply.github.com>
Date:   Thu Mar 16 14:18:35 2023 -0700

    Add Pinot docker files, table config and schema (#5163)

    * Initial checkin for pinot config files

commit 1304570
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Thu Mar 16 15:20:29 2023 +0200

    Set poll interval for filebased dynamic config if not set (#5160)

    * Set poll interval for filebased dynamic config if not set

    * update unit test

commit 42a14b1
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Thu Mar 16 10:49:21 2023 +0200

    Elasticsearch: reduce code duplication (#5137)

    * Elasticsearch: reduce code duplication

    * address comments

    ---------

    Co-authored-by: Zijian <Shaddoll@users.noreply.github.com>

commit cbf0d14
Author: bowen xiao <xbowen@uber.com>
Date:   Wed Mar 15 10:19:34 2023 -0700

    fix samples documentation (#5088)

commit ba19a29
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Wed Mar 15 12:52:29 2023 +0200

    Add ShardID to valid attributes (#5161)

commit a25cba8
Author: Mantas Šidlauskas <mantass@netapp.com>
Date:   Wed Mar 15 10:56:50 2023 +0200

    ES: single interface for different ES/OpenSearch versions (#5158)

    * ES: single interface for different ES/OpenSearch versions

    * make fmt

commit e3ac246
Author: Ketsia <115650494+ketsiambaku@users.noreply.github.com>
Date:   Tue Mar 14 12:47:40 2023 -0700

    added logging with workflow/domain tags (#5159)

commit 9581488
Author: Ketsia <115650494+ketsiambaku@users.noreply.github.com>
Date:   Mon Mar 13 16:56:45 2023 -0700

    Consistent query pershard metric (#5143)

    * added and update consistent query per shard metric

    * testing pershard metric

    * move sample logger into persistence metric client for cleaness

    * fix test

    * fix lint

    * fix test again

    * fix lint

    * sample logging with workflowid tag

    * added domain tag to logger

    * metric completed

    * addressing comments

    * fix lint

    * Revert "fix lint"

    This reverts commit 1e96944.

    * fix lint second attempt

    ---------

    Co-authored-by: Allen Chen <allenchen2244@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants