Find a way to have /data cohesive usage ? #14

mikefaille · 2015-09-15T21:13:12Z

I think /data could be use for many use case. Actually, the root of this folder contain configs and subfolders can contain anything else.

The problem ? /data is the volume and we can have subvolume. I'm not sure why config is on volume now. Maybe I miss something ?

I think the best way to use folders to mount any data is by keeping main forlder emtpy (/data) and using subfolders for configs (/data/etc), hdfs mount (/data/hdfs) or anything else.

PS : Maybe, we can use File system hierarchy standard http://www.pathname.com/fhs/pub/fhs-2.3.pdf (cause it's why I really love unix style files organisation) but i'm personnaly ok with /data usage if it's clear in README.md. I will meditate on this.

davidonlaptop · 2015-09-16T12:34:05Z

Indeed, it would be great to use the FHS standard instead of /data. The main reason for choosing /data was that typing docker commands is shorter, which also improves the image's usability. It's true though that something like /hdfs would be more appropriate.

A compromise could be to install and configure Hadoop using FHS and have some kind of symlink usable with docker so we could optionally mount -v /HOST_VOLUME:/hdfs. Which FHS folder would you recommend for HDFS data?

davidonlaptop · 2015-09-27T15:46:37Z

FYI. FHS 3.0 was released in June 2015. I just read it, and think we could use /srv/hdfs for HDFS data. Do you concur?

Reference: http://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s17.html

mikefaille · 2015-09-27T17:33:15Z

It sounds good. It's not too far to fetch too.

Btw, i'm just thinking about legibility of /usr/local/hadoop
Yes, it's very popular in community. But, it's not FHS compliant. I think FHS exist to be easy to retrieve data without knowledge for particular apps. So, since Hadoop community prefer this path and some apps like Golang prefer this path too; It should be ok. But, the best path to place tertiary hierarchy for package should be at /opt/packagename. Then, we can have : /opt/packagename/etc /opt/packagename/bin /opt/packagename/lib etc.

Although, we can think about /srv again. Their is a small issue with this path because their is only Opensuse compliant with this path and as I know, no apps use /srv. Then the issue category we have it's what I called familiarity aspect.

So, my personal recommendation to have generic and community proof standard :

using /data/ as /var/lib or /srv usage. Example: /data/dfs or /data/hdfs and /data/hbase but not directly /data as possible
Explanation : current hadoop distro use this path. CDH use directly /data for hadoop but it's bad cause we lost our capacity to store data from many apps. We already need few path like hbase.rootdir property that need different path than dfs.*. And, /data is use too by Docker community so it's great if we want enforce familiarity aspect.
use VOLUME instruction more granularly. Not for /data but for sub-folder if needed like /data/hdfs.
continue to use /usr/local/hadoop since /usr/local/ is recognize by community. Personally, I prefer /opt/hadoop because the path is shorter and it strictly respect FHS.

davidonlaptop · 2015-09-27T18:58:29Z

ok it settled then for /srv/hdfs for Hadoop HDFS data.

Agreed that we could also re-evaluate where we put the Hadoop binaries, config, etc. But why do you say that /usr/local/hadoop is not FHS compliant?

FHS 4.1 Purpose of /usr:

/usr is the second major section of the filesystem. /usr is shareable, read-only data. That means that /usr should be shareable between various FHS-compliant hosts and must not be written to. Any information that is host-specific or varies with time is stored elsewhere.

FHS 4.9 Purpose of /usr/local:

The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable amongst a group of hosts, but not found in /usr.

Locally installed software must be placed within /usr/local rather than /usr unless it is being installed to replace or upgrade software in /usr.

It seems to me that using /usr/local/hadoop is more FHS compliant than using /opt/hadoop because the latter would force us to nest all the subdirectories etc, bin and so on inside /opt/hadoop/ dir. While the former allows us to put the config in /usr/local/etc/hadoop or /etc/local/hadoop. But to be honest, /etc/hadoop would be the most user-friendly!

mikefaille · 2015-09-27T19:03:06Z

@davidonlaptop Having a subfolder representing package is not FHS compliant like /usr/local/package_name

davidonlaptop · 2015-09-27T19:15:41Z

Having a subfolder representing package is not FHS compliant like /usr/local/

Do you have a reference to support this claim ?

mikefaille · 2015-09-27T19:17:14Z

> Having a subfolder representing package is not FHS compliant like /usr/local/
Do you have a reference to support this claim ?

Yes, their is nothing about it on FHS 👍

It seems to me that using /usr/local/hadoop is more FHS compliant than using /opt/hadoop because the latter would force us to nest all the subdirectories etc, bin and so on inside /opt/hadoop/ dir.

Wrong. /opt/package_name permit us to put configs under /etc
http://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s13.html

Nesting subdirectory is already the case for /usr/local/hadoop.

Although, the 1st goal of FHS is having understandably way to predict right path. Again, if community choose other way like /data/whatever and /usr/local/<package_name>, it should be ok to predict path better than FHS way. For nuance, I give my own point-of-view in this answer : #14 (comment)

Then, if you really want have /etc/hadoop, just symlink /usr/local/hadoop/etc/hadoop to /etc/hadoop and, for logs, /usr/local/hadoop/logs (<-- this path can be way better) to /var/log/hadoop

davidonlaptop · 2015-09-27T19:31:13Z

Wrong. /opt/package_name permit us to put configs under /etc
http://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s13.html

That's right. So then we have 2 choices.

Although, the 1st goal of FHS is having understandably way to predict right path.

Agree with you. Then let's compare it with the industry: CDH, HDP, MapR and SequenceIQ. Let's see what they are doing.

mikefaille · 2015-09-27T19:42:08Z

To give my comments, I already check Dockerfile from : CDH, Mapr and SequenceIQ.

MapR use : /mapr (i really dont like it)

CDH is FHS compliant : //var/lib/hadoop-hdfs/cache/${user.name}/dfs/data

~~SequenceIQ use~~ Docker community way (little awful but it work for me only if we add /data/package_name as subfolder) : /data/package_name

davidonlaptop · 2015-09-27T21:16:59Z

Links to SequenceIQ dockerfiles (for future reference):

mikefaille · 2015-09-27T22:16:05Z

SequenceIQ seems use the default path under /tmp/hadoop-${user.name}/dfs/data
But, in Docker /tmp path is an tmpfs mount. So, even container could lost data if we shutdown it.

mikefaille changed the title ~~Find a way to have /data unique use ?~~ Find a way to have /data cohesive usage ? Sep 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a way to have /data cohesive usage ? #14

Find a way to have /data cohesive usage ? #14

mikefaille commented Sep 15, 2015

davidonlaptop commented Sep 16, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

Find a way to have /data cohesive usage ? #14

Find a way to have /data cohesive usage ? #14

Comments

mikefaille commented Sep 15, 2015

davidonlaptop commented Sep 16, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015

davidonlaptop commented Sep 27, 2015

mikefaille commented Sep 27, 2015