Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: aggregate metrics #41

Merged
merged 1 commit into from
Mar 29, 2024
Merged

feat: aggregate metrics #41

merged 1 commit into from
Mar 29, 2024

Conversation

pete-eiger
Copy link
Contributor

@pete-eiger pete-eiger commented Mar 27, 2024

  • add new table to db (aggregates)
  • add DB resolvers for adding and fetching aggregates
  • add tick in tokio::select to insert aggregate data once a day
  • add GraphQL resolver to allow for querying aggregates up to a given number of days

Open Questions
Should we delete the aggregates every X period? Maybe 30 days, 1 year? Open to suggestions ¯_(ツ)_/¯

@pete-eiger pete-eiger requested a review from hopeyen March 27, 2024 14:42
Copy link
Contributor

@hopeyen hopeyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think aggregate should be cleaned after a time period, since the struct seems small, it might be okay to do something like 90 or 180 days, maybe refer to Graphseer to see the timeframe they usually use

CREATE TABLE IF NOT EXISTS aggregates
(
timestamp BIGINT PRIMARY KEY,
data JSONB NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm so the reason why the message data get stored in jsonb was because we don't necessarily know the message fields in advance and a great deal of flexibility is required.

For aggregates, we are clear that the type is something we define within listener radio and is not dynamic. In general, separate columns is faster to query and search and easier to handle as a structured data.

would you explain your reasoning behind why jsonb is used instead of something more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was faster in terms of development time but I'll switch to specific types

@@ -35,6 +40,20 @@ impl RadioContext {
}
}

#[derive(Serialize, SimpleObject)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also interested in additional summary like total topics covered, average message count (or some other things mentioned in the message propagation issues), but yeah some are not so easy to get from the existing code

It looks like the two HashMaps are indexed by indexer address, so having a Vec for active_indexers doesn't seem so useful when they can grab the keys from one of the maps.

I also think it might be easier to simply return the vec of IndexerStats (same thing as AggregatedIndexerStats?) so that users can do manipulation independently with greater flexibility

Copy link
Contributor Author

@pete-eiger pete-eiger Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total topics covered

you mean all distinct ipfs hashes that Listener Radio has received messages for?

average message count

i don't quite get this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total topics covered

you mean all distinct ipfs hashes that Listener Radio has received messages for?

yep

average message count

i don't quite get this

you have the total_message_count for each indexer already, I'm interested in the average of that across all indexers. but, as I alluded to in the previous comment, it is something the client can do once they receive the vec of indexerStats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, let's leave it to the client

Copy link
Contributor Author

@pete-eiger pete-eiger Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add one for distinct ipfs hashes thought

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks.
What do you think about returning Vec<IndexerStats> directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

@pete-eiger pete-eiger marked this pull request as draft March 28, 2024 13:10
@pete-eiger pete-eiger marked this pull request as ready for review March 28, 2024 14:55
@pete-eiger pete-eiger requested a review from hopeyen March 28, 2024 14:55
Copy link
Contributor

@hopeyen hopeyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before closing this pr, would you add an issue for periodically clean up aggregates, perhaps in the same daily tick? and, I think it would be cool to have a metric/field on storing the network latency (message received time - message timestamp), perhaps that's a future feature

) -> Result<Vec<IndexerStats>, anyhow::Error> {
let aggregates = sqlx::query_as!(
Aggregate,
"SELECT timestamp, graph_account, message_count, subgraphs_count FROM indexer_aggregates WHERE timestamp > $1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to only select graph_account, message_count, subgraphs_count and query_as IndexerStats, but still use the timestamp for filtering? asking to see if this will let you avoid doing conversion from Aggregate to IndexerStats later

@@ -35,6 +40,20 @@ impl RadioContext {
}
}

#[derive(Serialize, SimpleObject)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks.
What do you think about returning Vec<IndexerStats> directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

@pete-eiger
Copy link
Contributor Author

@hopeyen

What do you think about returning Vec directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

I think it's better to leave it like this for now so that it's already usable straight from the graphql endpoint playground, for instance for @PilarRod to get the aggregates when needed.

@pete-eiger pete-eiger requested a review from hopeyen March 29, 2024 14:32
Copy link
Contributor

@hopeyen hopeyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@pete-eiger pete-eiger merged commit 20fdd82 into dev Mar 29, 2024
4 checks passed
@pete-eiger pete-eiger deleted the petko/aggregate-metrics branch March 29, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants