I have had a few adventures with Spark jobs not running on Yarn because I forgot to implement classes as serializable.
This one is a new issue that I have not faced before. It seems that reflection is not working because it is trying to reflect on an int and it can’t find the class.
Here are some logs:
14/12/17 16:26:15 INFO DAGScheduler: Submitting Stage 2 (FilteredRDD[2] at filter at WriteReport.java:109), which has no missing parents 14/12/17 16:26:15 INFO DAGScheduler: Submitting 6 missing tasks from Stage 2 (FilteredRDD[2] at filter at WriteReport.java:109) 14/12/17 16:26:15 INFO YarnClusterScheduler: Adding task set 2.0 with 6 tasks 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 12 on executor 1: hostname (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 1 ms 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 13 on executor 1: hostname (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 1 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 12 (task 2.0:0) 14/12/17 16:26:15 WARN TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: int at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
Then more tasks fail for the same reason until the job is aborted:
14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 14 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 0 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 13 (task 2.0:1) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 1] 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 15 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 14 (task 2.0:0) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 2] 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 16 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 0 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 15 (task 2.0:1) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 3] 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 17 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 16 (task 2.0:0) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 4] 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:0 as TID 18 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:0 as 7089 bytes in 1 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 17 (task 2.0:1) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 5] 14/12/17 16:26:15 INFO TaskSetManager: Starting task 2.0:1 as TID 19 on executor 1: ot1slhdp001v.mgmt.sl.hgn (NODE_LOCAL) 14/12/17 16:26:15 INFO TaskSetManager: Serialized task 2.0:1 as 7089 bytes in 0 ms 14/12/17 16:26:15 WARN TaskSetManager: Lost TID 18 (task 2.0:0) 14/12/17 16:26:15 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: int [duplicate 6] 14/12/17 16:26:15 ERROR TaskSetManager: Task 2.0:0 failed 4 times; aborting job 14/12/17 16:26:15 INFO YarnClusterScheduler: Cancelling stage 2 14/12/17 16:26:15 INFO YarnClusterScheduler: Stage 2 was cancelled
After many hours of research I found the issue in a class that is responsible to generate the report. I am passing an ArrayList with 2 objects and that causes this error. If my ArrayList has only 1 element everything “works”.