-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a way to have /data cohesive usage ? #14
Comments
Indeed, it would be great to use the FHS standard instead of A compromise could be to install and configure Hadoop using FHS and have some kind of symlink usable with docker so we could optionally mount |
FYI. FHS 3.0 was released in June 2015. I just read it, and think we could use Reference: http://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s17.html |
It sounds good. It's not too far to fetch too. Btw, i'm just thinking about legibility of /usr/local/hadoop Although, we can think about /srv again. Their is a small issue with this path because their is only Opensuse compliant with this path and as I know, no apps use /srv. Then the issue category we have it's what I called familiarity aspect. So, my personal recommendation to have generic and community proof standard :
|
ok it settled then for Agreed that we could also re-evaluate where we put the Hadoop binaries, config, etc. But why do you say that FHS 4.1 Purpose of /usr:
FHS 4.9 Purpose of /usr/local:
It seems to me that using |
@davidonlaptop Having a subfolder representing package is not FHS compliant like /usr/local/package_name |
Do you have a reference to support this claim ? |
Yes, their is nothing about it on FHS 👍
Wrong. /opt/package_name permit us to put configs under /etc Nesting subdirectory is already the case for /usr/local/hadoop. Although, the 1st goal of FHS is having understandably way to predict right path. Again, if community choose other way like /data/whatever and /usr/local/<package_name>, it should be ok to predict path better than FHS way. For nuance, I give my own point-of-view in this answer : #14 (comment) Then, if you really want have /etc/hadoop, just symlink /usr/local/hadoop/etc/hadoop to /etc/hadoop and, for logs, /usr/local/hadoop/logs (<-- this path can be way better) to /var/log/hadoop |
That's right. So then we have 2 choices.
Agree with you. Then let's compare it with the industry: CDH, HDP, MapR and SequenceIQ. Let's see what they are doing. |
To give my comments, I already check Dockerfile from : CDH, Mapr and SequenceIQ. MapR use : /mapr (i really dont like it) CDH is FHS compliant : //var/lib/hadoop-hdfs/cache/${user.name}/dfs/data
|
Links to SequenceIQ dockerfiles (for future reference): |
SequenceIQ seems use the default path under /tmp/hadoop-${user.name}/dfs/data |
I think /data could be use for many use case. Actually, the root of this folder contain configs and subfolders can contain anything else.
The problem ? /data is the volume and we can have subvolume. I'm not sure why config is on volume now. Maybe I miss something ?
I think the best way to use folders to mount any data is by keeping main forlder emtpy (/data) and using subfolders for configs (/data/etc), hdfs mount (/data/hdfs) or anything else.
PS : Maybe, we can use File system hierarchy standard http://www.pathname.com/fhs/pub/fhs-2.3.pdf (cause it's why I really love unix style files organisation) but i'm personnaly ok with /data usage if it's clear in README.md. I will meditate on this.
The text was updated successfully, but these errors were encountered: