Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BDSGOLD-301. Enable test_pyspark_shell test to pass on a fresh cluster with spark-2.3.2 as default. #14

Open
wants to merge 1 commit into
base: branch-2.3.2-alti
Choose a base branch
from

Conversation

anil-altiscale
Copy link

@anil-altiscale anil-altiscale commented May 5, 2019

The test_pyspark_shell.sh places README.md from /opt/spark-x.x.x directory into hdfs directory which is used by pyspark_shell_examples.py to create TF vectors. Including README.md inside test_data directory to avoid conflict with README.md in test_spark directory.

Copy link
Contributor

@alee-altiscale alee-altiscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions on this investigation :)

hdfs dfs -put $spark_home/README.md spark/test/

# Including spark README.md in test_data to differentiate from sparkexample README.md
hdfs dfs -put "$spark_test_dir/test_data/README.md" spark/test/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the README.md do exist in both https://github.com/Altiscale/sparkexample/blob/branch-2.3.2-alti/README.md and https://github.com/Altiscale/spark/blob/branch-2.3.2-alti/README.md although the one in sparkexample is pretty empty with one line of content.

The subject is misleading as well, it is enabling? I thought this test case always run, it is enabled all the time https://github.com/Altiscale/sparkexample/blob/branch-2.3.2-alti/run_all_test.kerberos.sh#L11

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The README.md file is present in both the repositories. This has more to do with the spark rpms. The test picks up README.md from $spark_home/ directory and hence should have been ideally present inside /opt/alti-spark-2.3.2/ directory. I am guessing while setting up the environment, the script does not copy the README file(https://github.com/Altiscale/sparkbuild/blob/branch-2.3.2-alti/scripts/justinstall.sh) as it does for 1.6.1(https://github.com/Altiscale/sparkbuild/blob/branch-1.6-alti/scripts/spark.spec#L365) and 2.1.1
(https://github.com/Altiscale/sparkbuild/blob/branch-2.1.1-alti/scripts/spark.spec#L314).
I thought we did not want to mess with the spark rpm and discussed that we could include it as a part of sparkexample rpm. Also, I just wanted the README.md to have some significant content and wanted to avoid conflict with already existing sparkexample README, hence included it in the test_data directory.
We could include a change to copy over the README.md in sparkbuild repo as well => changes to spark rpm.
For the subject, yep, I think I might need some other wordings.

@@ -43,7 +43,9 @@ fi
pushd `pwd`
cd $spark_home
hdfs dfs -mkdir -p spark/test/
hdfs dfs -put $spark_home/README.md spark/test/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will removing this break test suite on existing clusters?
As far as I know, test suites are often ran after a maintenance to verify the status of a cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants