-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling chapter #102
base: master
Are you sure you want to change the base?
Sampling chapter #102
Conversation
✅ Deploy Preview for epirhandbook ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
@AlexandreBlake - sorry for the delay, under the exam pile atm.... thanks so much this looks like a great start! In general - i think i would structure the chapter by having a general intro to the topic as you have, then for each of the methods show first a sample size calculation example and then a sampling example. Happy to jump on a call and discuss if need! Thanks again this is very awesome to see!
I think it is fine for this section as only small datasets really - @nsbatra thoughts? (neale can you also review chapter to make sure you think it fits to rest of handbook style please? Also in terms of length and amount of theory included? Record time spent on r4epis admin_mgt on clockify). My initial feeling is that there is a lot of theory explained, and probably long term we should shift this to the methods manual - but for now leave here as is necessary to understand the page.
Yeap I think surveys are the more complex so lets start with that - and then the basic analytical studies we can add after. Priority should be on sample size calculation and then link to sampling methods. @pbkeating - can you confirm and review please?
|
@aspina7 No worries...pile stacking up high here too...
I could probably do without it. But it makes a bit of interpolation to keep the figure neat and minimize the number of values I need to calculate to have something that smooth for the heatmap and the contour lines. It also probably keeps the chunk simpler than what would be needed with 100% ggplot2 I think. I would assume I would need to play with geom_tile and contour_lines after bumping the resolution if I went full ggplot2. Also regarding the GIS sampling part. Should it go here or in the GIS part? Because I assume that this chapter will be before the GIS one ? |
Okie dokie no worries stick with {metR} then! |
Hi Alexandre,
Sorry for the delay - I just looked through the chapter - great start! This
will be super useful for the readers.
Just a few stylistic points. I can swoop in at the end and make finishing
touches.
- I would kindly request that we transition it to the style of the other
chapters where the data is an external file that is imported and cleaned at
the top of the chapter.
- Standard headings are: Preparation (with sub-headings Load packages
and Import data). You can see on most any other chapter how importing is
done with rio and here packages. We read in packages with pacman::p_load()
(there is a standard paragraph at the top of most chapters that describes
this, that you can copy).
- After those two headings it is up to you, with "Resources" at the very
end.
- TPlease remember to put package names in bold, functions referenced in
the text are written with parentheses() at the end, and any in-text R code
in code-text.
- You can use DT to display the data in an interactive table
(scrollable, etc.). See example code for this throughout the Handbook.
Happy to clarify anything. Again, thank you very much for writing this!
Neale
…On Mon, Feb 28, 2022 at 1:40 AM Alex Spina ***@***.***> wrote:
Okie dokie no worries stick with {metR} then!
I think let's leave gis sampling here for now and if it makes more sense
we can migrate later! Thx!
—
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMN4O7YSEVZVXG42F7BEHKDU5MKHLANCNFSM5OQRACIA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Adding some paragraphs Reshuffling the order of some parts Adding a tree diagram Reformating a bit to stick to the format of the other chapters
|
ok perfect - poke me if/when you need me to take another look, thx! |
@AlexandreBlake thanks for your flexibility adapting the data format. For the specific chunk you mentioned with the intention of user re-generating the data that sounds fine (and cool!) I think references in the resources section is fine, the format of the text under the title is flexible. |
Adding the chunk on GIS sampling.
@aspina7 Sorry for the silence, a PhD milestone thing cannibalized my brain and my time. This is behind, so I can push hard on the chapter.
|
@AlexandreBlake - no worries at all, just crawling out from under a rock myself... poke me when you want me to do a full review pls. In the meantime a few brief thoughts - would be good to hear from @nsbatra and @pbkeating too:
|
Hi Alex,
I think it's probably useful for users to be able to type out the calculations for sample sizes. Do you think lower level users would just use ENA or openepi and so wouldn't need to use R, is that the idea?
I think pulling data from openstreetmap is the way we should go.
P
Patrick Keating
Epidemiology Advisor
Médecins Sans Frontières - OCA
Manson Unit
Lower Ground Floor
Chancery Exchange | 10 Furnival Street, London, EC4A 1AB |
Skype/whatsapp: +447871642307
…________________________________
From: Alex Spina ***@***.***>
Sent: Friday, April 8, 2022 12:11 PM
To: appliedepi/epiRhandbook_eng ***@***.***>
Cc: Patrick Keating ***@***.***>; Mention ***@***.***>
Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102)
CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..
________________________________
@AlexandreBlake<https://github.com/AlexandreBlake> - no worries at all, just crawling out from under a rock myself...
I think adding an export to gpx would be great (doesnt need to get too technical)
This way ppl can import to osmand (and we will have examples of that in the theory manual, not here) - @pbkeating<https://github.com/pbkeating> put example code in his post<#83 (comment)>.
poke me when you want me to do a full review pls.
In the meantime a few brief thoughts - would be good to hear from @nsbatra<https://github.com/nsbatra> and @pbkeating<https://github.com/pbkeating> too:
* do we need to also demonstrate sample size calculations with the packages mentioned, or others?
* are typing calculations out too scary for low level users? And will those users be doing sample size calculations anyway?
* for spatial sampling should we stick to pulling buildings from openstreetmaps, with osmdata<https://www.google.com/search?q=osmdata&oq=osmdata&aqs=chrome..69i57j0i10l6j69i65.1097j0j7&sourceid=chrome&ie=UTF-8>?
—
Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ6ILQR3DTJCGYRV77DVEAH5JANCNFSM5OQRACIA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@pbkeating Hi Pat, The formulae are there for SRS using categorical/continuous variable for the primary variable of interest, and the design effects/multistage flavors as well. Which one do you have in mind specifically? |
I would assume that creating a figure for the 1st graph illustrating the principle of sampling would be preferable to showing a pre-existing one (with the proper reference)? |
Switching the spatial sampling to the case where we pull building gps positions from OSM
Hi Alex,
Apologies, I hadn’t checked the code out and thought the question was whether or not to include formula.
I checked the code out now and I think it looks good. I think we will have a diverse audience for this page and many will want to fully understand what is going in the background and so the formula are helpful!
I think we should keep them!
P
From: Alexandre Blake ***@***.***>
Sent: 18 April 2022 09:06
To: appliedepi/epiRhandbook_eng ***@***.***>
Cc: Patrick Keating ***@***.***>; Mention ***@***.***>
Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102)
CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..
…________________________________
@pbkeating<https://github.com/pbkeating> Hi Pat,
The formulae are there for SRS using categorical/continuous variable for the primary variable of interest, and the design effects/multistage flavors as well. Which one do you have in mind specifically?
—
Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ5VPWD47T2ZTS7LNXTVFTGTJANCNFSM5OQRACIA>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
@pbkeating yeap - agree keep formulas. but should we also show example of how to get to the same numbers using package functions (if available)? This would be the same as in other pages where we have demonstrated how to do the same thing in base and also {dplyr} for example.
@AlexandreBlake when you poke me to review, can you point me to where the difference is? Ususally worth showing the painful method too .... but I can just add it back in when reviewing if you link me to it.
If easy to make a figure go for it - if not we have just made sure the authors of the figure are okay with us reproducing it here and then referencing accordingly. Both fine... e.g. here I took an example from a textbook. |
Yes, if there are packages that can do the job, then good to have alternatives. Showing the formula allows for people to get a fuller understanding of the process and takes the black box feeling of using, for example, the weight function in the sitrep templates
Patrick Keating
Epidemiology Advisor
Médecins Sans Frontières - OCA
Manson Unit
Lower Ground Floor
Chancery Exchange | 10 Furnival Street, London, EC4A 1AB |
Skype/whatsapp: +447871642307
…________________________________
From: Alex Spina ***@***.***>
Sent: Monday, April 18, 2022 5:02 PM
To: appliedepi/epiRhandbook_eng ***@***.***>
Cc: Patrick Keating ***@***.***>; Mention ***@***.***>
Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102)
CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..
________________________________
I think we should keep them!
@pbkeating<https://github.com/pbkeating> yeap - agree keep formulas. but should we also show example of how to get to the same numbers using package functions (if available)? This would be the same as in other pages where we have demonstrated how to do the same thing in base and also {dplyr} for example.
We do not show the strategy that requires the painful cleaning phase. I describe it and the associated limitations but that's it. I can still add that chunk back from the previous commit if needed though.
@AlexandreBlake<https://github.com/AlexandreBlake> when you poke me to review, can you point me to where the difference is? Ususally worth showing the painful method too .... but I can just add it back in when reviewing if you link me to it.
I would assume that creating a figure for the 1st graph illustrating the principle of sampling would be preferable to showing a pre-existing one (with the proper reference)?
If easy to make a figure go for it - if not we have just made sure the authors of the figure are okay with us reproducing it here and then referencing accordingly. Both fine... e.g. here<https://epirhandbook.com/en/time-series-and-outbreak-detection.html#prediction-validation> I took an example from a textbook.
—
Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ3HY6ON4TD7HRY4XYDVFWBQBANCNFSM5OQRACIA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I would agree but I doubt we can show the cleaning procedure on R if we want to keep it simple. That would imply only showing the strategy to draw points from a polygon and keep the descriptions of the pain associated to the cleaning but without actually doing the cleaning. Would that still be fine?
@pbkeating I will have a quick look at those packages and add something if their use is not the "headache realm". |
ah gotcha sorry - read through your explanation now. You are right, without having polygons for building cleaning would be excruciating (otherwise using intersection and buffer would only be a few lines of code). |
@aspina7 Will have a stab at it then. Will keep you posted. |
Addition of chunks to draw and clean points in GIS sampling using mapedit
Cleaning bits of the text/code
|
sorry @AlexandreBlake - super delayed in getting back to you! have been under the books again, chapter looks really good and im using bits of it to update an EPIET case study for next week. Will hopefully be able to do a full review soon... sorry again! and thanks so much for all the work! |
@aspina7 No worries! Busy time here too. I will be hard to reach next week, but we can chat/exchange about it whenever after. |
@aspina7 No pressure, but any thing you think needs more polish? |
Sorry @AlexandreBlake - offline in Sicily at the moment. But will do a review and push directly after summer. |
@AlexandreBlake am so so sorry! Life took a sharp left-turn and been under a deep rock. Reviewing this now and will make edits directly and hopefully get the book to knit, will push directly to this branch once finish reviewing. @nsbatra can we get alex an invoice for this please and @AlexandreBlake can you send us your bank details so we can get you paid asap. really sorry again! note to myself: i am at performing srs |
@aspina7 No worries, life is stochastic on my side as well, and it looks like you have been busy. I can help with some tweaks on the chapter if needed. But I am on the finishing line to defend (a long line of a couple of months), so I cannot promise super regular help. |
is all good - should be able to pull together - good luck with the final phd push! note to self: mention {osmextracter} |
balls - maptools is being retired, will need to shift to mapedit or terra. |
Hi @AlexandreBlake and all, Not sure if the above discussion regarding whether to include examples of cleaning the data was resolved or not... but in case useful just thought to link to a recent discussion initiated by @pbkeating to add pulling OSM building data to the EPIET case study on spatial sampling. I proposed some code there which very simply uses For removing points that are in locations incorrectly identified as a building, or which turn out not to be a building for the function of interest (i.e. residential) in real life, I think this would require a more interactive approach - for instance doing a validation survey with survey software that collects GPS coordinates, then just leaves the more tricky question of how to convert the 'not a building' coordinates to a polygon that could be used for excluding them with |
Also one thing I just noticed @AlexandreBlake :
For the edit to the EPIET case study that I mentioned above, I extracted |
Hi @AmyMikhail What we found at the time (definitely not the only option though) was that it was pretty fast to just "clean as you draw points" by visualizing the points with a given buffer on recent images as you draw. Then the cleaning was a simple click: just keep, the point is then saved and you keep drawing until you reach your desired sample size, or drop, and you redraw your point. By just deciding with y/n or Enter/Space on the keyboard the cleaning/drawing ends up being pretty fast even for sample sizes of several hundred points. I t was still a bit fastidious, but it was way faster compared to the iterative process of draw points/check/clean and repeat (assuming that you would also draw some reserve points as a buffer for the unavoidable points falling on buildings that do not qualify for your survey). The current version used in Epicentre might rely on a slightly different set of packages than in the chunk I shared with Patrick, but last time I chatted with Serge it boiled down to the same functionalities overall in the Geosampler (the name of their shiny app). There used to be two main "schools of thought" in Epicentre a couple of years ago: using a very large polygon to draw points but doing extensive cleaning of the points vs spend a lot of time by excluding as much "empty space" from the polygon (assuming you have reasonably recent imagery to do it and still keeping some margin) but saving a lot of time in cleaning points. Then a third option showed up when there was no time constraint: paying people to put point on every damn roof and use that to draw points directly. I am not sure of the current state of affairs. The validation surveys you mention have been used in large surveys in Nigeria with a logistics on the field so wild that checking a couple of days before data collection in a given area was possible. But it meant redrawing a lot of points frequently and it was tedious. The "interactive cleaning" as you draw points used to be the "best" option we used (again it might be different now). It was partially because we could not confidently rely on the building tag in OSM data in most of the settings we did surveys in (too remote or with large changes in a short period of time not captured in OSM). |
ach.... lost all my changes because did not push. Will restart... sorry! |
@aspina7 I messed up a bit the creation of the branch. My bad, I do not use git so much collaboratively.
I will commit and push only to it starting now.
There are few things I might need you feedback with:
I still have a lot to tweak here and there but and chunks to modify/add. But it should be enough to start getting a feedback so help yourself.
For now I generate data in my chunks to illustrate my points rather than load a pre-existing data set. I find it more convenient but it adds code that might not be the main interest of this chapter. I also noticed that in other chapters loading data seems to be the rule. No big deal?
I assumed that we were focusing on surveys and put the sample size calculation for analytical studies on the side. Did I assume right?