Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if ES Storage is passed a proper isArchive flag #6065

Open
yurishkuro opened this issue Oct 6, 2024 · 12 comments
Open

Investigate if ES Storage is passed a proper isArchive flag #6065

yurishkuro opened this issue Oct 6, 2024 · 12 comments
Labels
area/storage help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch

Comments

@yurishkuro
Copy link
Member

This thread #6060 (comment)

@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Oct 6, 2024
@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 17, 2024

@yurishkuro dug into this a little bit while redoing the configurations and here's what I found

So I believe that this is working as expected. Let me know if there's anything I'm missing here.

One other thing to note is that in v1, the additional namespaces need to be explictly enabled (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/options.go#L110) but the archive namespace is enabled by default in v2 (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/options.go#L437).

@yurishkuro
Copy link
Member Author

@mahadzaryab1 In v1 the workflow is:

  1. the binary creates a Factory
  2. asks factory to initialize itself from CLI flags
  3. creates main storage implementations via CreateSpanWriter
  4. optionally calls CreateArchiveSpanWriter

The key observation here is that the distinction between primary and archive is handled inside the Factory because the caller clearly indicates the intent by calling InitArchiveStorage and CreateArchiveSpanWriter. in other words, the caller delegates the specific initialization required for archive storage to the Factory - a single Factory that manages both primary and archive.

In contrast, in v2 the workflow is supposed to be different

  • the user designates a configuration as primary or archive, but the configuration otherwise is the same and each configuration is represented by a unique Factory object
  • when initializing query service the v2 extension is supposed to call CreateSpanWriter on the primary factory, and also call CreateSpanWriter on the archive factory (calling the same API)
  • instead, v2 extension still falls back onto calling two different APIs due to invoking InitArchiveStorage

So v2 is serendipitously working right now, but not as expected. We can see it in how Memory storage is handled - when I configured all-in-one with primary and archive storage, originally it did not work for me because Memory storage factory never implemented the Archive sub-API, so I had to add it (and the side effect of it is that archive for memory cannot be turned off now). But that was exactly the opposite of what I wanted to happen - I wanted the config to be able to configure two storages and designate each as primary / archive, so that the v2 extension would only interact with the factories via CreateSpanWriter.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro Thanks so much for the helpful context. Do you have any thoughts on how we should proceed here? Do we want to make changes to the query extension to not use InitArchiveStorage?

@yurishkuro
Copy link
Member Author

Yes that would be ideal. It may need refactoring of query service, eg perhaps it should be taking a separate archive factory instead of using a different interface on the main factory.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro just looking for a bit of clarification

so today, we're doing the following:

	f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
	if err != nil {
		return fmt.Errorf("cannot find archive storage factory: %w", err)
	}

	if !opts.InitArchiveStorage(f, s.telset.Logger) {
		s.telset.Logger.Info("Archive storage not initialized")
	}

Is the goal to be able to just do the following? Why do we need to introduce a new archive factory?

	f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
	if err != nil {
		return fmt.Errorf("cannot find archive storage factory: %w", err)
	}

	spanReader, err := f.CreateSpanReader()
	if err != nil {
		return fmt.Errorf("cannot create span reader: %w", err)
	}
        opts.ArchiveSpanReader = f.spanReader

@yurishkuro
Copy link
Member Author

yes

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 19, 2024

@yurishkuro do we still need a separate interface to do the above? One other question I had was, will simply calling CreateSpanReader behave the same way as CreateArchiveSpanReader does? How would there be a differentiation between normal storage logic and archive storage logic?

@yurishkuro
Copy link
Member Author

That is a good question, but if you look at all implementations of GetArchiveSomething they are no different from regular method. I think the only difference is in ES implementation which needs the isArchive flag. We can expose that flag as part of the ES config - not ideal because the user has to remember to set it, otherwise they might get limited look back. But I think in ES the archive storage requires separate settings anyway, eg you may not want to rollover index every day.

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 20, 2024

@yurishkuro How would it work if we add the is_archive flag to the ES config? We're currently storing separate configs in the ES factory as follows so we have no way of knowing if CreateSpanSomething is being called for the regular config or the archive config.

	primaryConfig *config.Configuration
	archiveConfig *config.Configuration

@mahadzaryab1
Copy link
Collaborator

Do we want to refactor the ES factory to only only hold one config/client and then propagate the isArchive flag to it?

@yurishkuro
Copy link
Member Author

In v2 we have two independent factories. I'd say yes, we want to refactor the factories to represent just one kind of storage. It will require changes to query service. I think only Cassandra and ES factories are storing two different namespaces.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro Got it! I'll get started on that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch
Projects
Status: GA blocking
Development

No branches or pull requests

2 participants