Apache Spark DataFrame Numeric Column (Again)

There is nothing like having a way to make it work to find more ways.

I found a .cast() method for the columns I want to use as numeric value and this avoids using a UDF to transform it.

I now prefer this way… until I find another, simpler…

package com.cinq.experience;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

import java.io.UnsupportedEncodingException;

public class Session {

	public static void main(String[] args) throws UnsupportedEncodingException {
		SparkConf conf = new SparkConf().setAppName("SparkExperience").setMaster("local");
		JavaSparkContext jsc = new JavaSparkContext(conf);
		SQLContext sqlContext = new SQLContext(jsc);

		DataFrame df = sqlContext.read()
				.option("header", "true")

		DataFrame crazy = df.select(df.col("x-custom-a"), df.col("x-custom-count").cast(DataTypes.LongType));
		crazy.groupBy(crazy.col("x-custom-a")).avg("CAST(x-custom-count, LongType)").show();

Published by


Java developper that loves photography and an excellent espresso

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s