Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping Examples #19

Open
jonthegeek opened this issue Aug 12, 2023 · 1 comment
Open

Scraping Examples #19

jonthegeek opened this issue Aug 12, 2023 · 1 comment

Comments

@jonthegeek
Copy link
Owner

Add an appendix with examples for scraping chapter, or maybe multiple appendices. That way you have control over the content + format, mostly at least. Or I guess consider putting it outside the book as raw html. In this repo or elsewhere? Deployed in github pages? Can the quarto site have static, unrelated html?

@jonthegeek jonthegeek added this to the How can I scrape web pages? {rvest} milestone Aug 12, 2023
@jonthegeek
Copy link
Owner Author

  1. A static table that's the only thing on the page. Content = ? Starting to work in some API terminology would be sneaky. Maybe the main sections of an OpenAPI 3.1 doc?
  2. Multiple tables and/or weird formatting that makes html_table() a little painful. SelectorGadget needs to be easier than html_table, or at least clearer. A page with HTTP Request Methods in one table and HTTP Status Codes in a second might suffice. Probably need to format something weird to freak out html_table.
  3. Structured content not in a table, like the {rvest} Star Wars data. A collection of Xpath rules would be fantastic, if I can put something together (or reshare something with an appropriate license).
  4. Structured content in a particular cell of 1 of 2 tables on a page, to show piping / using Xpath directly. Include images so that one of the things we grab can be src's of those images. Images are a great example for this, actually.
  5. Can I ~easily deploy something that requires a session? If so, do that. Might require something like Netlify, but I also MIGHT be able to do simple (non-secure) session stuff purely via HTML/GitHub Pages.
  6. Probably a page with HTTP request methods and status codes, to slyly

@jonthegeek jonthegeek removed this from the How can I scrape web pages? {rvest} milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant