Sampling chapter #102

AlexandreBlake · 2022-02-16T05:08:02Z

@aspina7 I messed up a bit the creation of the branch. My bad, I do not use git so much collaboratively.
I will commit and push only to it starting now.

There are few things I might need you feedback with:

I still have a lot to tweak here and there but and chunks to modify/add. But it should be enough to start getting a feedback so help yourself.
For now I generate data in my chunks to illustrate my points rather than load a pre-existing data set. I find it more convenient but it adds code that might not be the main interest of this chapter. I also noticed that in other chapters loading data seems to be the rule. No big deal?
I assumed that we were focusing on surveys and put the sample size calculation for analytical studies on the side. Did I assume right?

netlify · 2022-02-16T05:08:08Z

✅ Deploy Preview for epirhandbook ready!

Name	Link
🔨 Latest commit	`2b00449`
🔍 Latest deploy log	https://app.netlify.com/sites/epirhandbook/deploys/627c7fb4b967680008971e82
😎 Deploy Preview	https://deploy-preview-102--epirhandbook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

aspina7 · 2022-02-26T15:05:56Z

@AlexandreBlake - sorry for the delay, under the exam pile atm.... thanks so much this looks like a great start!
Have added some comments on code below (linked to line as cant seem to comment directly) - and replies below too.

In general - i think i would structure the chapter by having a general intro to the topic as you have, then for each of the methods show first a sample size calculation example and then a sampling example. Happy to jump on a call and discuss if need! Thanks again this is very awesome to see!

For now I generate data in my chunks to illustrate my points rather than load a pre-existing data set. I find it more convenient but it adds code that might not be the main interest of this chapter. I also noticed that in other chapters loading data seems to be the rule. No big deal?

I think it is fine for this section as only small datasets really - @nsbatra thoughts? (neale can you also review chapter to make sure you think it fits to rest of handbook style please? Also in terms of length and amount of theory included? Record time spent on r4epis admin_mgt on clockify). My initial feeling is that there is a lot of theory explained, and probably long term we should shift this to the methods manual - but for now leave here as is necessary to understand the page.

I assumed that we were focusing on surveys and put the sample size calculation for analytical studies on the side. Did I assume right?

Yeap I think surveys are the more complex so lets start with that - and then the basic analytical studies we can add after. Priority should be on sample size calculation and then link to sampling methods. @pbkeating - can you confirm and review please?

line 28 : {metR} looks like an interesting package, is it necessary for the page though, that cant be done with ggplot otherwise (what function using?) - as an aside, can this package be used to do a density map with cases/km2 for the gis page?
line 36: maybe a decision tree type graphic would be useful to help ppl decide what section to go to and what to consider?
line 50: missed the "m" in sampling
line 54: @nsbatra to confirm the appropriate way to link to the survey analysis page
line 55: I would simplify wording to read: "involves randomly selecting a sub-population (number of individuals or sampling units) out of a total population (population of an area or a finite number of sampling units) with a probability $p=\frac{n}{N}$." (nb. dont need to have "it" in the front of those)
line 56 I would simplify wording to read: "involves selecting a sub-population (number of individuals or sampling units) using a sampling frame. Unlike with the SRS, a constant sampling interval $k=\frac{n}{N}$ is used with the $1^{st}$ sampling unit randomly chosen between 1 and $k$ and every following $k^{th}$ sampling unit selected (e.g. we select every _n_th individual). It is a reasonable approximation of SRS."
line 58 I would change the example to be e.g. vaccination coverage by district or by camp block or something
line 71 do we need example code here already? maybe we should talk about the sample size calculations first and then go in to sampling appropriate numbers from there.
line 99 i think keep the first srs example very basic and dont even mention clustering or stratification - And then in later sections build in the complexity. e.g. just say that we select from a list (sampling frame) of all students attending a school.
line 101 I think we need to make it clear by adding a sentence here that normally, sampling frames are from an external source - e.g. an excel sheet with a name, gender and status of all the children in the school. But we will create a fake dataset to demonstrate below.
line 115: I would break this in to two steps, first sample to get the numbers (saving as an object), then subset rows with that object.
line 138: before showing how variations can impact sample size i think we need just a very basic example of how to calculate the number of children needed given an alpha of bla and a precision of bla... Also the sample size calculation should be moved up to come before the actual sampling (otherwise they dont know how many they need to sample!) - same applies for all sections calculation first then sample.
line 294 - here too break up the code in to smaller bits per line, rather than having lots going on between parantheses,- makes it less scary for beginners.
line 314: consider taking some of the wording which annick wrote for the r4epis website on sampling, its less technical but think it addresses a lot of what basic users need. https://r4epis.netlify.app/surveys/

AlexandreBlake · 2022-02-27T22:15:29Z

@aspina7 No worries...pile stacking up high here too...
I will go over it again later this coming week point by point. Two things though:

line 28 : {metR} looks like an interesting package, is it necessary for the page though, that cant be done with ggplot otherwise (what function using?) - as an aside, can this package be used to do a density map with cases/km2 for the gis page?

I could probably do without it. But it makes a bit of interpolation to keep the figure neat and minimize the number of values I need to calculate to have something that smooth for the heatmap and the contour lines. It also probably keeps the chunk simpler than what would be needed with 100% ggplot2 I think. I would assume I would need to play with geom_tile and contour_lines after bumping the resolution if I went full ggplot2.

Also regarding the GIS sampling part. Should it go here or in the GIS part? Because I assume that this chapter will be before the GIS one ?

aspina7 · 2022-02-28T06:40:41Z

Okie dokie no worries stick with {metR} then!
I think let's leave gis sampling here for now and if it makes more sense we can migrate later! Thx!

nsbatra · 2022-03-02T01:05:15Z

Hi Alexandre, Sorry for the delay - I just looked through the chapter - great start! This will be super useful for the readers. Just a few stylistic points. I can swoop in at the end and make finishing touches. - I would kindly request that we transition it to the style of the other chapters where the data is an external file that is imported and cleaned at the top of the chapter. - Standard headings are: Preparation (with sub-headings Load packages and Import data). You can see on most any other chapter how importing is done with rio and here packages. We read in packages with pacman::p_load() (there is a standard paragraph at the top of most chapters that describes this, that you can copy). - After those two headings it is up to you, with "Resources" at the very end. - TPlease remember to put package names in bold, functions referenced in the text are written with parentheses() at the end, and any in-text R code in code-text. - You can use DT to display the data in an interactive table (scrollable, etc.). See example code for this throughout the Handbook. Happy to clarify anything. Again, thank you very much for writing this! Neale

…

On Mon, Feb 28, 2022 at 1:40 AM Alex Spina ***@***.***> wrote: Okie dokie no worries stick with {metR} then! I think let's leave gis sampling here for now and if it makes more sense we can migrate later! Thx! — Reply to this email directly, view it on GitHub <#102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMN4O7YSEVZVXG42F7BEHKDU5MKHLANCNFSM5OQRACIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Adding some paragraphs Reshuffling the order of some parts Adding a tree diagram Reformating a bit to stick to the format of the other chapters

AlexandreBlake · 2022-03-07T15:20:02Z

@aspina7

line 36: maybe a decision tree type graphic would be useful to help ppl decide what section to go to and what to consider?
line 50: missed the "m" in sampling
line 55: I would simplify wording to read: "involves randomly selecting a sub-population (number of individuals or sampling units) out of a total population (population of an area or a finite number of sampling units) with a probability $p=\frac{n}{N}$." (nb. dont need to have "it" in the front of those)
line 56 I would simplify wording to read: "involves selecting a sub-population (number of individuals or sampling units) using a sampling frame. Unlike with the SRS, a constant sampling interval $k=\frac{n}{N}$ is used with the $1^{st}$ sampling unit randomly chosen between 1 and $k$ and every following $k^{th}$ sampling unit selected (e.g. we select every _n_th individual). It is a reasonable approximation of SRS."
line 58 I would change the example to be e.g. vaccination coverage by district or by camp block or something
line 71 do we need example code here already? maybe we should talk about the sample size calculations first and then go in to sampling appropriate numbers from there.
line 99 i think keep the first srs example very basic and dont even mention clustering or stratification - And then in later sections build in the complexity. e.g. just say that we select from a list (sampling frame) of all students attending a school.
line 101 I think we need to make it clear by adding a sentence here that normally, sampling frames are from an external source - e.g. an excel sheet with a name, gender and status of all the children in the school. But we will create a fake dataset to demonstrate below.
line 115: I would break this in to two steps, first sample to get the numbers (saving as an object), then subset rows with that object.
line 138: before showing how variations can impact sample size i think we need just a very basic example of how to calculate the number of children needed given an alpha of bla and a precision of bla... Also the sample size calculation should be moved up to come before the actual sampling (otherwise they dont know how many they need to sample!) - same applies for all sections calculation first then sample.

Done. I simplified the code a bit, adding steps here and there. I slightly reshuffled the order following your suggestions.
I still have to go through the GIS sampling part. It is on my to do list for the next session.

@nsbatra

I modified it to stick to the current format. I will just save the generated data in a separate file to load. same goes for the decision tree figure. I keep that step for the end though: I do not use Rstudio....I am not a fan (I know I am weird). So until we reach the final step it is not super convenient to use here. But is it problematic to keep the little chunk creating data for villages to select clusters by PPS as is? The point of this chunk is to allow the readers to generate the data several times with different values but always end up with an equal selection probability.
I keep my references at the end for now. It does not quite fit with the resource section I have seen in other chapters. Should I just keep it but reformat it differently?

aspina7 · 2022-03-07T17:16:19Z

ok perfect - poke me if/when you need me to take another look, thx!

nsbatra · 2022-03-09T22:01:59Z

@AlexandreBlake thanks for your flexibility adapting the data format. For the specific chunk you mentioned with the intention of user re-generating the data that sounds fine (and cool!)

I think references in the resources section is fine, the format of the text under the title is flexible.

Adding the chunk on GIS sampling.

AlexandreBlake · 2022-04-03T21:42:22Z

@aspina7 Sorry for the silence, a PhD milestone thing cannibalized my brain and my time. This is behind, so I can push hard on the chapter.

I added a chunk on GIS sampling and turned the mock data into a data.frame we load. There is a fine line on the GIS stuff where I guess I should not get into details but still make it functional. I refer the reader to the GIS chapter. But, I know that some specific things such as juggling between format to throw your sampled points into GPS devices for example is not there. Should I expand on that topic (although it feels a bit like a tangent) or are there going to be some additions to the GIS chapter?
I will start looking/making a figure for the overview of the chapter and throw it in there soon.

aspina7 · 2022-04-08T11:11:04Z

@AlexandreBlake - no worries at all, just crawling out from under a rock myself...
I think adding an export to gpx would be great (doesnt need to get too technical)
This way ppl can import to osmand (and we will have examples of that in the theory manual, not here) - @pbkeating put example code in his post.

poke me when you want me to do a full review pls.

In the meantime a few brief thoughts - would be good to hear from @nsbatra and @pbkeating too:

do we need to also demonstrate sample size calculations with the packages mentioned, or others?
- are typing calculations out too scary for low level users? And will those users be doing sample size calculations anyway?
for spatial sampling should we stick to pulling buildings from openstreetmaps, with osmdata?

pbkeating · 2022-04-18T01:52:43Z

Hi Alex, I think it's probably useful for users to be able to type out the calculations for sample sizes. Do you think lower level users would just use ENA or openepi and so wouldn't need to use R, is that the idea? I think pulling data from openstreetmap is the way we should go. P Patrick Keating Epidemiology Advisor Médecins Sans Frontières - OCA Manson Unit Lower Ground Floor Chancery Exchange | 10 Furnival Street, London, EC4A 1AB | Skype/whatsapp: +447871642307

…

________________________________ From: Alex Spina ***@***.***> Sent: Friday, April 8, 2022 12:11 PM To: appliedepi/epiRhandbook_eng ***@***.***> Cc: Patrick Keating ***@***.***>; Mention ***@***.***> Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102) CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..

________________________________ @AlexandreBlake<https://github.com/AlexandreBlake> - no worries at all, just crawling out from under a rock myself... I think adding an export to gpx would be great (doesnt need to get too technical) This way ppl can import to osmand (and we will have examples of that in the theory manual, not here) - @pbkeating<https://github.com/pbkeating> put example code in his post<#83 (comment)>. poke me when you want me to do a full review pls. In the meantime a few brief thoughts - would be good to hear from @nsbatra<https://github.com/nsbatra> and @pbkeating<https://github.com/pbkeating> too: * do we need to also demonstrate sample size calculations with the packages mentioned, or others? * are typing calculations out too scary for low level users? And will those users be doing sample size calculations anyway? * for spatial sampling should we stick to pulling buildings from openstreetmaps, with osmdata<https://www.google.com/search?q=osmdata&oq=osmdata&aqs=chrome..69i57j0i10l6j69i65.1097j0j7&sourceid=chrome&ie=UTF-8>? — Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ6ILQR3DTJCGYRV77DVEAH5JANCNFSM5OQRACIA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

AlexandreBlake · 2022-04-18T03:06:17Z

@pbkeating Hi Pat,

The formulae are there for SRS using categorical/continuous variable for the primary variable of interest, and the design effects/multistage flavors as well. Which one do you have in mind specifically?

AlexandreBlake · 2022-04-18T04:52:25Z

@aspina7 @pbkeating

I dumped the part sampling points using boundaries of the urban clusters and turned it instead into sampling points from the building feature of OSM data of the area. One thing though: it means that we only show how to sample from the pre-existing points we pull from OSM. We do not show the strategy that requires the painful cleaning phase. I describe it and the associated limitations but that's it. I can still add that chunk back from the previous commit if needed though.
The export intol kml and gpx has been added.
I will make a last round with fresh eyes in the coming days and ping you @aspina7 when I am done so that you can go through it.

I would assume that creating a figure for the 1st graph illustrating the principle of sampling would be preferable to showing a pre-existing one (with the proper reference)?

Switching the spatial sampling to the case where we pull building gps positions from OSM

pbkeating · 2022-04-18T14:16:06Z

Hi Alex, Apologies, I hadn’t checked the code out and thought the question was whether or not to include formula. I checked the code out now and I think it looks good. I think we will have a diverse audience for this page and many will want to fully understand what is going in the background and so the formula are helpful! I think we should keep them! P From: Alexandre Blake ***@***.***> Sent: 18 April 2022 09:06 To: appliedepi/epiRhandbook_eng ***@***.***> Cc: Patrick Keating ***@***.***>; Mention ***@***.***> Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102) CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..

…

________________________________ @pbkeating<https://github.com/pbkeating> Hi Pat, The formulae are there for SRS using categorical/continuous variable for the primary variable of interest, and the design effects/multistage flavors as well. Which one do you have in mind specifically? — Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ5VPWD47T2ZTS7LNXTVFTGTJANCNFSM5OQRACIA>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

aspina7 · 2022-04-18T16:01:56Z

I think we should keep them!

@pbkeating yeap - agree keep formulas. but should we also show example of how to get to the same numbers using package functions (if available)? This would be the same as in other pages where we have demonstrated how to do the same thing in base and also {dplyr} for example.

We do not show the strategy that requires the painful cleaning phase. I describe it and the associated limitations but that's it. I can still add that chunk back from the previous commit if needed though.

@AlexandreBlake when you poke me to review, can you point me to where the difference is? Ususally worth showing the painful method too .... but I can just add it back in when reviewing if you link me to it.

I would assume that creating a figure for the 1st graph illustrating the principle of sampling would be preferable to showing a pre-existing one (with the proper reference)?

If easy to make a figure go for it - if not we have just made sure the authors of the figure are okay with us reproducing it here and then referencing accordingly. Both fine... e.g. here I took an example from a textbook.

pbkeating · 2022-04-18T16:06:20Z

Yes, if there are packages that can do the job, then good to have alternatives. Showing the formula allows for people to get a fuller understanding of the process and takes the black box feeling of using, for example, the weight function in the sitrep templates Patrick Keating Epidemiology Advisor Médecins Sans Frontières - OCA Manson Unit Lower Ground Floor Chancery Exchange | 10 Furnival Street, London, EC4A 1AB | Skype/whatsapp: +447871642307

…

________________________________ From: Alex Spina ***@***.***> Sent: Monday, April 18, 2022 5:02 PM To: appliedepi/epiRhandbook_eng ***@***.***> Cc: Patrick Keating ***@***.***>; Mention ***@***.***> Subject: Re: [appliedepi/epiRhandbook_eng] Sampling chapter (PR #102) CAUTION: This email originated from outside MSF. Do not click links, open attachments or provide information unless you recognise the sender and know that the content is safe..

________________________________ I think we should keep them! @pbkeating<https://github.com/pbkeating> yeap - agree keep formulas. but should we also show example of how to get to the same numbers using package functions (if available)? This would be the same as in other pages where we have demonstrated how to do the same thing in base and also {dplyr} for example. We do not show the strategy that requires the painful cleaning phase. I describe it and the associated limitations but that's it. I can still add that chunk back from the previous commit if needed though. @AlexandreBlake<https://github.com/AlexandreBlake> when you poke me to review, can you point me to where the difference is? Ususally worth showing the painful method too .... but I can just add it back in when reviewing if you link me to it. I would assume that creating a figure for the 1st graph illustrating the principle of sampling would be preferable to showing a pre-existing one (with the proper reference)? If easy to make a figure go for it - if not we have just made sure the authors of the figure are okay with us reproducing it here and then referencing accordingly. Both fine... e.g. here<https://epirhandbook.com/en/time-series-and-outbreak-detection.html#prediction-validation> I took an example from a textbook. — Reply to this email directly, view it on GitHub<#102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2NEQ3HY6ON4TD7HRY4XYDVFWBQBANCNFSM5OQRACIA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

AlexandreBlake · 2022-04-18T16:13:03Z

@aspina7

@AlexandreBlake when you poke me to review, can you point me to where the difference is? Ususally worth showing the painful method too .... but I can just add it back in when reviewing if you link me to it.

I would agree but I doubt we can show the cleaning procedure on R if we want to keep it simple. That would imply only showing the strategy to draw points from a polygon and keep the descriptions of the pain associated to the cleaning but without actually doing the cleaning. Would that still be fine?

Yes, if there are packages that can do the job, then good to have alternatives. Showing the formula allows for people to get a fuller understanding of the process and takes the black box feeling of using, for example, the weight function in the sitrep templates

@pbkeating I will have a quick look at those packages and add something if their use is not the "headache realm".

aspina7 · 2022-04-18T16:43:50Z

Would that still be fine?

ah gotcha sorry - read through your explanation now. You are right, without having polygons for building cleaning would be excruciating (otherwise using intersection and buffer would only be a few lines of code).
But we could demonstrate a very basic example with interactive packages?
e.g. draw a polyon, sample points from there, remove the points that are not over a house (according to basemap tiles), and then sample more points (the n removed)? Using for example {mapedit} based on {leaflet}
cc @AmyMikhail who will be using the code from this page to update epiet case studies

AlexandreBlake · 2022-04-18T17:03:34Z

@aspina7 Will have a stab at it then. Will keep you posted.

Addition of chunks to draw and clean points in GIS sampling using mapedit

Cleaning bits of the text/code

AlexandreBlake · 2022-05-12T03:38:36Z

@aspina7

Done with the interactive cleaning part and the gpx/kml export.
mapedit is pretty good with interacting with polygons or lines, but not with points (unless I missed it when I played with it and went through their pages). So, the cleaning is a bit cumbersome with people drawing polygons around the points to remove, but it illustrates the "cleaning" procedure.
I added sample size calculation using a package for the simple cases (SRS with continuous or categorical primary variable of interest).
I added a simple home-made schematic at the beginning of the page.
I would need you to have an overall look, I struggle to spot where I left chunks that became a but redundant now.

aspina7 · 2022-06-03T13:41:19Z

sorry @AlexandreBlake - super delayed in getting back to you! have been under the books again, chapter looks really good and im using bits of it to update an EPIET case study for next week. Will hopefully be able to do a full review soon... sorry again! and thanks so much for all the work!

AlexandreBlake · 2022-06-03T19:09:47Z

@aspina7 No worries! Busy time here too. I will be hard to reach next week, but we can chat/exchange about it whenever after.

AlexandreBlake · 2022-06-28T14:03:14Z

@aspina7 No pressure, but any thing you think needs more polish?

aspina7 · 2022-07-01T16:31:35Z

Sorry @AlexandreBlake - offline in Sicily at the moment. But will do a review and push directly after summer.

aspina7 · 2023-02-12T19:50:53Z

@AlexandreBlake am so so sorry! Life took a sharp left-turn and been under a deep rock. Reviewing this now and will make edits directly and hopefully get the book to knit, will push directly to this branch once finish reviewing.

@nsbatra can we get alex an invoice for this please and @AlexandreBlake can you send us your bank details so we can get you paid asap.

really sorry again!

note to myself: i am at performing srs

AlexandreBlake · 2023-02-13T10:34:20Z

@aspina7 No worries, life is stochastic on my side as well, and it looks like you have been busy. I can help with some tweaks on the chapter if needed. But I am on the finishing line to defend (a long line of a couple of months), so I cannot promise super regular help.

aspina7 · 2023-02-14T08:21:31Z

is all good - should be able to pull together - good luck with the final phd push!

note to self: mention {osmextracter}

aspina7 · 2023-04-11T10:19:19Z

balls - maptools is being retired, will need to shift to mapedit or terra.
Am about 3 quarts of the way through reviewing ... hopefully done before summer!

AmyMikhail · 2023-06-30T16:08:40Z

Hi @AlexandreBlake and all,

Not sure if the above discussion regarding whether to include examples of cleaning the data was resolved or not... but in case useful just thought to link to a recent discussion initiated by @pbkeating to add pulling OSM building data to the EPIET case study on spatial sampling.

I proposed some code there which very simply uses st_intersection() to remove points that fall outside a polygon, and I was wondering if the same principle could not be used for building outlines that you can see on OpenStreetMap, since the data you get from that includes the building polygons?

For removing points that are in locations incorrectly identified as a building, or which turn out not to be a building for the function of interest (i.e. residential) in real life, I think this would require a more interactive approach - for instance doing a validation survey with survey software that collects GPS coordinates, then just leaves the more tricky question of how to convert the 'not a building' coordinates to a polygon that could be used for excluding them with st_intersect() again. Epicentre used an algorithm to create polygons from coordinates based on their proximity to each other - but I don't know if this was using existing packages/functions or something they developed from scratch. I'm curious to know if you already had a way to do that @AlexandreBlake ?

AmyMikhail · 2023-06-30T16:36:33Z

Also one thing I just noticed @AlexandreBlake :

osm_points for buildings are actually the points that make up the polygon that represents the shape of the building on OpenStreetMap - I didn't check but assume it is one set of coordinates for each corner.

For the edit to the EPIET case study that I mentioned above, I extracted osm_polygons instead and used st_centroid() to get a single pair of coordinates for each building. In the Kario camp example that is used in the case study, this seems to match up to the roofs well (although needless to say sampling from those won't make any adjustments for roof size).

AlexandreBlake · 2023-06-30T16:53:53Z

Hi @AmyMikhail
Thanks for sharing this. The chapter includes a basic interactive example to get a feel of how cumbersome it is. The Epicentre algorithm is a complex variation (now included in a fancy shiny app that pulls plenty of available data/tiles) on the chunk of code shared here.

What we found at the time (definitely not the only option though) was that it was pretty fast to just "clean as you draw points" by visualizing the points with a given buffer on recent images as you draw. Then the cleaning was a simple click: just keep, the point is then saved and you keep drawing until you reach your desired sample size, or drop, and you redraw your point. By just deciding with y/n or Enter/Space on the keyboard the cleaning/drawing ends up being pretty fast even for sample sizes of several hundred points. I t was still a bit fastidious, but it was way faster compared to the iterative process of draw points/check/clean and repeat (assuming that you would also draw some reserve points as a buffer for the unavoidable points falling on buildings that do not qualify for your survey). The current version used in Epicentre might rely on a slightly different set of packages than in the chunk I shared with Patrick, but last time I chatted with Serge it boiled down to the same functionalities overall in the Geosampler (the name of their shiny app).

There used to be two main "schools of thought" in Epicentre a couple of years ago: using a very large polygon to draw points but doing extensive cleaning of the points vs spend a lot of time by excluding as much "empty space" from the polygon (assuming you have reasonably recent imagery to do it and still keeping some margin) but saving a lot of time in cleaning points. Then a third option showed up when there was no time constraint: paying people to put point on every damn roof and use that to draw points directly. I am not sure of the current state of affairs.

The validation surveys you mention have been used in large surveys in Nigeria with a logistics on the field so wild that checking a couple of days before data collection in a given area was possible. But it meant redrawing a lot of points frequently and it was tedious.

The "interactive cleaning" as you draw points used to be the "best" option we used (again it might be different now). It was partially because we could not confidently rely on the building tag in OSM data in most of the settings we did surveys in (too remote or with large changes in a short period of time not captured in OSM).

aspina7 · 2023-11-14T10:04:21Z

ach.... lost all my changes because did not push. Will restart... sorry!

Creating a branch on the sampling chapter

a12e0c6

Hunting some typos

1fd52b9

Adding some paragraphs Reshuffling the order of some parts Adding a tree diagram Reformating a bit to stick to the format of the other chapters

AlexandreBlake added 2 commits April 3, 2022 17:26

Externalizing the mockup sampling frame.

86ee5d2

Adding the chunk on GIS sampling.

Removing the chunk to generate the mock data

9846feb

Adding the kml/gpx export

aa1d6ea

Switching the spatial sampling to the case where we pull building gps positions from OSM

AlexandreBlake added 2 commits April 26, 2022 10:46

Addition of calculation size using 1 package

2829039

Addition of chunks to draw and clean points in GIS sampling using mapedit

Addition of a schematic

2b00449

Cleaning bits of the text/code

This was referenced Sep 2, 2024

Add to the survey section a component on selection of clusters with probability proportional to size #88

Open

add a section on point sampling and population estimates #152

Open

arranhamlet mentioned this pull request Oct 2, 2024

3_External_review #310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling chapter #102

Sampling chapter #102

AlexandreBlake commented Feb 16, 2022

netlify bot commented Feb 16, 2022 •

edited

Loading

aspina7 commented Feb 26, 2022

AlexandreBlake commented Feb 27, 2022 •

edited

Loading

aspina7 commented Feb 28, 2022

nsbatra commented Mar 2, 2022 via email

AlexandreBlake commented Mar 7, 2022

aspina7 commented Mar 7, 2022

nsbatra commented Mar 9, 2022

AlexandreBlake commented Apr 3, 2022

aspina7 commented Apr 8, 2022

pbkeating commented Apr 18, 2022 via email

AlexandreBlake commented Apr 18, 2022

AlexandreBlake commented Apr 18, 2022 •

edited

Loading

pbkeating commented Apr 18, 2022 via email

aspina7 commented Apr 18, 2022

pbkeating commented Apr 18, 2022 via email

AlexandreBlake commented Apr 18, 2022 •

edited

Loading

aspina7 commented Apr 18, 2022

AlexandreBlake commented Apr 18, 2022

AlexandreBlake commented May 12, 2022

aspina7 commented Jun 3, 2022 •

edited

Loading

AlexandreBlake commented Jun 3, 2022

AlexandreBlake commented Jun 28, 2022

aspina7 commented Jul 1, 2022

aspina7 commented Feb 12, 2023 •

edited

Loading

AlexandreBlake commented Feb 13, 2023

aspina7 commented Feb 14, 2023

aspina7 commented Apr 11, 2023 •

edited

Loading

AmyMikhail commented Jun 30, 2023

AmyMikhail commented Jun 30, 2023

AlexandreBlake commented Jun 30, 2023

aspina7 commented Nov 14, 2023

Sampling chapter #102

Are you sure you want to change the base?

Sampling chapter #102

Conversation

AlexandreBlake commented Feb 16, 2022

netlify bot commented Feb 16, 2022 • edited Loading

✅ Deploy Preview for epirhandbook ready!

aspina7 commented Feb 26, 2022

AlexandreBlake commented Feb 27, 2022 • edited Loading

aspina7 commented Feb 28, 2022

nsbatra commented Mar 2, 2022 via email

AlexandreBlake commented Mar 7, 2022

aspina7 commented Mar 7, 2022

nsbatra commented Mar 9, 2022

AlexandreBlake commented Apr 3, 2022

aspina7 commented Apr 8, 2022

pbkeating commented Apr 18, 2022 via email

AlexandreBlake commented Apr 18, 2022

AlexandreBlake commented Apr 18, 2022 • edited Loading

pbkeating commented Apr 18, 2022 via email

aspina7 commented Apr 18, 2022

pbkeating commented Apr 18, 2022 via email

AlexandreBlake commented Apr 18, 2022 • edited Loading

aspina7 commented Apr 18, 2022

AlexandreBlake commented Apr 18, 2022

AlexandreBlake commented May 12, 2022

aspina7 commented Jun 3, 2022 • edited Loading

AlexandreBlake commented Jun 3, 2022

AlexandreBlake commented Jun 28, 2022

aspina7 commented Jul 1, 2022

aspina7 commented Feb 12, 2023 • edited Loading

AlexandreBlake commented Feb 13, 2023

aspina7 commented Feb 14, 2023

aspina7 commented Apr 11, 2023 • edited Loading

AmyMikhail commented Jun 30, 2023

AmyMikhail commented Jun 30, 2023

AlexandreBlake commented Jun 30, 2023

aspina7 commented Nov 14, 2023

netlify bot commented Feb 16, 2022 •

edited

Loading

AlexandreBlake commented Feb 27, 2022 •

edited

Loading

AlexandreBlake commented Apr 18, 2022 •

edited

Loading

AlexandreBlake commented Apr 18, 2022 •

edited

Loading

aspina7 commented Jun 3, 2022 •

edited

Loading

aspina7 commented Feb 12, 2023 •

edited

Loading

aspina7 commented Apr 11, 2023 •

edited

Loading