Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise 9 doesn't work with suggested date of creation column #9

Open
brendam opened this issue Aug 16, 2016 · 3 comments
Open

Exercise 9 doesn't work with suggested date of creation column #9

brendam opened this issue Aug 16, 2016 · 3 comments

Comments

@brendam
Copy link
Contributor

brendam commented Aug 16, 2016

Using the suggested common transform to_date on the date of creation column doesn't work. Using value.toDate("yyyyMMdd") works for 677 of the dates in the column, but there are 3,584 non-date rows shown if you facet to timeline. Many of these are "Not Available" but some look like the should be correct dates and I'm not sure why they haven't worked.

Should some of the Java simple date format definitions be included in the notes, or perhaps a link to them in the more detail?

@weaverbel
Copy link

I agree with @brendam that this didn't work as expected though my outcome was different. After converting the column to date format using the common To date: transform, my dates in the next GREL transform turned into this "01 January 17830500" and the second transform left them in the same state. @pitviper6 @ostephens - any ideas?

@ostephens
Copy link
Contributor

ostephens commented Aug 16, 2016

So I think @brendam is correct to use

value.toDate("yyyyMMdd")

I can't see it working otherwise.

However, as @brendam also notes this will only correct some of the lines. To get the other lines into a standardised date format would take more work.

You can find the lines that will work by doing a custom text facet using
value.match(/(\d{8})/).length()==1

(or similar) - which finds the lines which have 8 digits in a string. This is 676 lines from what I can see (possibly the single blank line is the extra line that @brendam reports as converting to a date OK).

The remaining lines are a mix. Some don't contain a date, some contain multiple dates or additional punctuation (e.g. date surrounded by square brackets, or with a question mark replacing a digit), and some (and I suspect these are the ones puzzling @brendam) have come into OR as numbers instead of strings.

This last issue is probably caused by the cell formatting in Excel, which is then being reflected in OR. You could fix it in OR I think, but to be honest it might be easier/better to fix it in the Excel file that is used for the exercise.

So I think the lesson should be updated to use value.toDate("yyyyMMdd") and then a decision made about whether the non-string lines should be corrected in the Excel file used in this exercise.

Of course, all the things that don't convert to dates correctly can be used as teaching points if that is preferable to correcting the issues with the underlying data.

@pitviper6
Copy link
Contributor

Weird! I can take a look at it and fix the data. I'm finally in Chicago and the movers came yesterday, so my life is still chaotic but at least I have more than outfit to wear!

I'll play with it - maybe that's a good column for a facet/cleanup/transformation exercise. Dates can be tricky!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants