Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-5670] Add Spark 3.5 support #5664

Merged
merged 6 commits into from
Oct 25, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ Getting Started
User Documentation
~~~~~~~~~~~~~~~~~~

`Read the documentation for Spark 3.4 <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/index.html>`__ (or
`3.3 <http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/index.html>`__ ,
`Read the documentation for Spark 3.5 <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/index.html>`__ (or
`3.4 <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/index.html>`__ ,
`3.3 <http://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html>`__ ,
`3.2 <http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/index.html>`__ ,
`3.1 <http://docs.h2o.ai/sparkling-water/3.1/latest-stable/doc/index.html>`__,
`3.0 <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/index.html>`__,
Expand All @@ -29,7 +30,8 @@ User Documentation
Download Binaries
~~~~~~~~~~~~~~~~~

`Download the latest version for Spark 3.4 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.4/latest.html>`__ (or
`Download the latest version for Spark 3.5 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.5/latest.html>`__ (or
`3.4 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.4/latest.html>`__,
`3.3 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.3/latest.html>`__,
`3.2 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.2/latest.html>`__,
`3.1 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.1/latest.html>`__,
Expand Down Expand Up @@ -95,20 +97,20 @@ Use Sparkling Water with PySpark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sparkling Water can be also used directly from PySpark and the integration is called PySparkling.

See `PySparkling README <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/pysparkling.html>`__ to learn about PySparkling.
See `PySparkling README <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/pysparkling.html>`__ to learn about PySparkling.

Use Sparkling Water via Spark Packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To see how Sparkling Water can be used as Spark package, please see `Use as Spark Package <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/tutorials/use_as_spark_package.html>`__.
To see how Sparkling Water can be used as Spark package, please see `Use as Spark Package <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/tutorials/use_as_spark_package.html>`__.

Use Sparkling Water in Windows environments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See `Windows Tutorial <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/tutorials/run_on_windows.html>`__ to learn how to use Sparkling Water in Windows environments.
See `Windows Tutorial <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/tutorials/run_on_windows.html>`__ to learn how to use Sparkling Water in Windows environments.

Sparkling Water examples
~~~~~~~~~~~~~~~~~~~~~~~~
To see how to run examples for Sparkling Water, please see `Running Examples <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/running_examples.html>`__.
To see how to run examples for Sparkling Water, please see `Running Examples <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/running_examples.html>`__.

Maven packages
~~~~~~~~~~~~~~
Expand Down Expand Up @@ -140,26 +142,26 @@ backend. The backend can be specified before creation of the
``H2OContext``.

For more details regarding the internal or external backend, please see
`Backends <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/deployment/backends.html>`__.
`Backends <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/deployment/backends.html>`__.

--------------

FAQ
---

List of all Frequently Asked Questions is available at `FAQ <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/FAQ.html>`__.
List of all Frequently Asked Questions is available at `FAQ <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/FAQ.html>`__.

--------------

Development
-----------

Complete development documentation is available at `Development Documentation <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/devel.html>`__.
Complete development documentation is available at `Development Documentation <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/devel.html>`__.

Build Sparkling Water
~~~~~~~~~~~~~~~~~~~~~

To see how to build Sparkling Water, please see `Build Sparkling Water <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/build.html>`__.
To see how to build Sparkling Water, please see `Build Sparkling Water <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/build.html>`__.

Develop applications with Sparkling Water
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -226,7 +228,7 @@ We also respond to questions tagged with sparkling-water and h2o tags on the `St
Change Logs
~~~~~~~~~~~

Change logs are available at `Change Logs <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/CHANGELOG.html>`__.
Change logs are available at `Change Logs <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/CHANGELOG.html>`__.

---------------

Expand Down
6 changes: 2 additions & 4 deletions core/src/test/scala/ai/h2o/sparkling/TestUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,11 @@
*/
package ai.h2o.sparkling

import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
import java.io.File
import java.nio.file.Files
import java.sql.Timestamp

import org.apache.spark.mllib
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.encoders.{ExpressionEncoder, RowEncoder}
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
import org.apache.spark.sql.functions.{lit, rand}
import org.apache.spark.sql.types._
Expand Down Expand Up @@ -130,7 +128,7 @@ object TestUtils extends Matchers {
spark: SparkSession,
schemaHolder: SchemaHolder,
settings: GenerateDataFrameSettings): DataFrame = {
implicit val encoder: ExpressionEncoder[Row] = RowEncoder(schemaHolder.schema)
implicit val encoder = RowEncoder(schemaHolder.schema)
val numberOfPartitions = Math.max(1, settings.numberOfRows / settings.rowsPerPartition)
spark
.range(settings.numberOfRows)
Expand Down
2 changes: 1 addition & 1 deletion gradle-spark3.3.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
sparkVersion=3.3.2
minSupportedJavaVersion=1.8
supportedEmrVersion=emr-6.10.0
supportedEmrVersion=emr-6.11.1
unsupportedMinorSparkVersions=
scalaVersion=2.12.15
databricksVersion=11.0.x-cpu-ml-scala2.12
Expand Down
2 changes: 1 addition & 1 deletion gradle-spark3.4.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
sparkVersion=3.4.1
minSupportedJavaVersion=1.8
supportedEmrVersion=emr-6.10.0
supportedEmrVersion=emr-6.13.0
unsupportedMinorSparkVersions=
scalaVersion=2.12.17
databricksVersion=13.0.x-cpu-ml-scala2.12
Expand Down
10 changes: 10 additions & 0 deletions gradle-spark3.5.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
sparkVersion=3.5.0
minSupportedJavaVersion=1.8
supportedEmrVersion=emr-6.10.0
unsupportedMinorSparkVersions=
scalaVersion=2.12.18
databricksVersion=14.0.x-cpu-ml-scala2.12
fabricK8sClientVersion=6.4.1
executorOverheadMemoryOption=spark.executor.memoryOverhead
driverOverheadMemoryOption=spark.driver.memoryOverhead
supportedPythonVersions=3.8 3.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to add Python 3.10 and 3.11

4 changes: 2 additions & 2 deletions gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ dockerImageVersion=79
# Is this build nightly build
isNightlyBuild=false
# Supported Major Spark Versions
supportedSparkVersions=2.3 2.4 3.0 3.1 3.2 3.3 3.4
supportedSparkVersions=2.3 2.4 3.0 3.1 3.2 3.3 3.4 3.5
# The list of python environments used in automated tests
pythonEnvironments=3.6 3.7 3.8 3.9
# Select for which Spark version is Sparkling Water built by default
spark=3.4
spark=3.5
# Sparkling Water Version
version=3.42.0.1-1-SNAPSHOT
# Spark version from which is Kubernetes Supported
Expand Down
1 change: 1 addition & 0 deletions py-scoring/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This package contains just functionality for scoring with Sparkling Water, H2O-3

Documentation describing scoring with H2O-3 MOJO models is located at:

- For Spark 3.5 - https://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/deployment/load_mojo.html
- For Spark 3.4 - https://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/deployment/load_mojo.html
- For Spark 3.3 - https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/deployment/load_mojo.html
- For Spark 3.2 - https://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/deployment/load_mojo.html
Expand Down
1 change: 1 addition & 0 deletions py/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ to use this package for scoring with Driverless AI MOJO models.

PySparkling Documentation is hosted at our documentation page:

- For Spark 3.5 - http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/pysparkling.html
- For Spark 3.4 - http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/pysparkling.html
- For Spark 3.3 - http://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/pysparkling.html
- For Spark 3.2 - http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/pysparkling.html
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,21 @@

package ai.h2o.sparkling.ml.models

import java.io._
import ai.h2o.mojos.runtime.MojoPipeline
import ai.h2o.mojos.runtime.api.{MojoPipelineService, PipelineConfig}
import ai.h2o.mojos.runtime.frame.MojoColumn.Type
import ai.h2o.mojos.runtime.frame.MojoFrame
import ai.h2o.sparkling.ml.params.{H2OAlgorithmMOJOParams, H2OBaseMOJOParams, HasFeatureTypesOnMOJO}
import org.apache.spark.ml.param._
import org.apache.spark.sql._
import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
import com.google.common.collect.Iterators
import org.apache.spark.annotation.DeveloperApi
import org.apache.spark.ml.Model
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.ml.param._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

import java.io._
import scala.collection.JavaConverters._

class H2OMOJOPipelineModel(override val uid: String)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@

package ai.h2o.sparkling.ml.utils

import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
import org.apache.spark.ml.attribute.AttributeGroup
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types._
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.types.StructType

/**
* for explanation see utils/src/main/scala_spark_3.5/ai/h2o/sparkling/sql/catalyst/encoders/RowEncoder.scala
*/
object RowEncoder {
def apply(schema: StructType) = org.apache.spark.sql.catalyst.encoders.RowEncoder(schema)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.types.StructType

/**
* for explanation see utils/src/main/scala_spark_3.5/ai/h2o/sparkling/sql/catalyst/encoders/RowEncoder.scala
*/
object RowEncoder {
def apply(schema: StructType) = org.apache.spark.sql.catalyst.encoders.RowEncoder(schema)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.types.StructType

/**
* for explanation see utils/src/main/scala_spark_3.5/ai/h2o/sparkling/sql/catalyst/encoders/RowEncoder.scala
*/
object RowEncoder {
def apply(schema: StructType) = org.apache.spark.sql.catalyst.encoders.RowEncoder(schema)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.types.StructType

/**
* for explanation see utils/src/main/scala_spark_3.5/ai/h2o/sparkling/sql/catalyst/encoders/RowEncoder.scala
*/
object RowEncoder {
def apply(schema: StructType) = org.apache.spark.sql.catalyst.encoders.RowEncoder(schema)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.types.StructType

/**
* for explanation see utils/src/main/scala_spark_3.5/ai/h2o/sparkling/sql/catalyst/encoders/RowEncoder.scala
*/
object RowEncoder {
def apply(schema: StructType) = org.apache.spark.sql.catalyst.encoders.RowEncoder(schema)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package ai.h2o.sparkling.sql.catalyst.encoders

import org.apache.spark.sql.catalyst.encoders.RowEncoder.encoderFor
import org.apache.spark.sql.types.StructType

/**
* Spark 3.5 removed apply method for RowEncoder, forcing the user to use encoderFor method
* But some older versions do not have it, hence we use different code for older/newer Spark versions.
*/
object RowEncoder {
def apply(schema: StructType) = encoderFor(schema)

}
Loading