-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data format #4
Comments
Hi Fangwu, 100-200MB is too big for GitHub. Also, for the data access summary posted, only four of the samples listed have the full complement of histone methylation data. Do we think this will be sufficient? Annie |
We'd have to split it into separate files and upload them separately though, which is possible |
Some files each have over 100MB sizes. I am thinking to put up the links and you can easily find and download to your own PC. |
|
Good comments. I just think that maybe we can do analyses on DNA and histone separately and each generate list of TFs which we can combine together. The ChIP data are only available for the two mature lineages, it might be interesting to infer the lineage-specific gene expression from histone status. |
@fangwuwang RNA_D1_HSCbm_100 and RNA_D1_HSC_100? RNA_D1_MLP0_100","RNA_D1_MLP1_100","RNA_D1_MLP2_100","RNA_D1_MLP3_100" ? Are they replicates or different cell type? Can I consider them as replicate? -Rawnak |
@rawnakhoque Sorry I did not notice that the MEP population was missing. Let's remove that population for now. |
@fangwuwang No problem. Thanks for you explanation. Now the data format is more clear to me. :) |
@fangwuwang In the project proposal, point 3 of the aim and method section, you mentioned "convert/assign the transcript level to the gene level; ". Do you have any idea how to do that? |
@rawnakhoque I tried searched on the website and found this very useful page that lists all available tools and packages based on their functions. Check the quantitation category I am sure there will be good packages to do this task. |
Hey guys, since you're using publicly available dataset, the one thing you should put on GitHub is the download script for the data, or least describe where and how you get the data. Ideally, members of your group or other people looking at your repo can just follow the instruction and get a copy of that themselves. So something like this:
By running this script in R, each person can get a copy of that script. @fangwuwang you can then write another data processing script. Ideally, anyone running this script will get exactly the same data as you, that way you don't need to check back and forth with others. |
Sorry I could not come for the seminar today. I have downloaded the data but could not push to the repo because of the large data size. Any way to upload data with a size (100-200MB) to Github?
For your information, the data format is as below:
Bigwig for Bisulfite-seq (also defined hyper/hypo-me regions in bigbed format);
Processed RNA-seq with quantitation on gene/transcript level in txt format (contig data in Bigwig also available if we don't trust their processing);
Processed ChIP-seq in bigbed format (bigwig data also available but much bigger).
The available data for each sample is summarized in the exel doc uploaded earlier. I have all data on my PC already and will find a way to pass to everyone later.
The text was updated successfully, but these errors were encountered: