Spark job not working

I have had a few adventures with Spark jobs not running on Yarn because I forgot to implement classes as serializable.

This one is a new issue that I have not faced before. It seems that reflection is not working because it is trying to reflect on an int and it can’t find the class.

Here are some logs:

14/12/17 16:26:15 INFO DAGScheduler: Submitting Stage 2 (FilteredRDD[2] at filter at WriteReport.java:109), which has no missing parents
14/12/17 16:26:15 INFO DAGScheduler: Submitting 6 missing tasks from Stage 2 (FilteredRDD[2] at filter at WriteReport.java:109)
14/12/17 16:26:15 INFO YarnClusterScheduler: Adding task set 2.0 with 6 tasks
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 12 on executor 1: hostname (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 1 ms
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 13 on executor 1: hostname (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 1 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 12 (task 2.0:0)
14/12/17 16:26:15 WARN TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: int
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

Then more tasks fail for the same reason until the job is aborted:

14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 14 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 0 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 13 (task 2.0:1)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 1]
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 15 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 14 (task 2.0:0)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 2]
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 16 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 0 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 15 (task 2.0:1)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 3]
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 17 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 16 (task 2.0:0)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 4]
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 18 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 1 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 17 (task 2.0:1)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 5]
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 19 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL)
14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms
14/12/17 16:26:15 WARN TaskSetManager: Lost TID 18 (task 2.0:0)
14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 6]
14/12/17 16:26:15 ERROR TaskSetManager: Task 2.0:0 failed 4 times; aborting job
14/12/17 16:26:15 INFO YarnClusterScheduler: Cancelling stage 2
14/12/17 16:26:15 INFO YarnClusterScheduler: Stage 2 was cancelled

After many hours of research I found the issue in a class that is responsible to generate the report. I am passing an ArrayList with 2 objects and that causes this error. If my ArrayList has only 1 element everything “works”.

Advertisements

Published by

m5c

Java developper that loves photography and good coffee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s