Using Spark to Create APIs in Scala Kristopher Sandoval August 27, 2015 In our previous piece, we discussed the strengths of the Java Language within the Spark framework, highlighting the ways Java Spark increases simplicity, encourages good design, and allows for ease of development. In this piece we continue our coverage on Spark, a micro framework great for defining and dispatching routes to functions that handle requests made to your web API’s endpoints. We’re going to examine the counterpoint to Java Spark, Scala Spark. We’ll discuss the origin, methodologies, and applications of Scala, as well as some use-cases where Scala Spark is highly effective. Reintroducing Spark In the first piece of this series, Using Spark to Create APIs in Java, we discussed Spark as a toolkit to primarily define and dispatch routes to functions that handle requests made to the API endpoint. Spark was designed specifically to make these route definitions quick and easy, utilizing the lambdas built into Java 8. When we first introduced Spark, we used this typical Hello Word example: import static spark.Spark.*; public class HelloWorld { public static void main(String[] args) { get("/hello", (request, response) -> "Hello World"); } } When this snippet is run, Spark will spin up a web server that will serve our API. The user can then navigate to https://localhost:4567/hello, which will call the lamba mapped with spark.Spark.get method. This will return Hello World. Check out our Spark intro for more about Spark’s history, routing capabilities, including wildcards in routes, processing request/responses, and templatizing. Scala — It’s Origin and Purpose The basic architecture and design of Scala was released in 2001 at the École Polytechnique Fédérale de Lausanne (EPFL) research university by German computer scientist and professor of programming methods Martin Odersky. Designed primarily as a Java bytecode compliant language meant to function without the shortcomings of Java proper, Scala was named as a portmanteau of the words “scalable” and “language”. This name highlights the goal of Scala proper — a language that is extensible, powerful, and designed to grow as the demands of its users and developers grow. Because the language is derived from the Java bytecode, it is functionally object-oriented in nature. Benefits of Scala There are many benefits inherent in Scala that makes it a wonderful choice for a wide range of applications. Functional and Useful: We’ve previously discussed the difference between functionality and usabiltiiy, and Scala meets both of these very different requirements with finesse. Scala is designed to support migration, simple syntax, immutability, and cross-support with many other languages and extensions. Scalable by Design: Scala is short for “Scalable” — and it shows. Because Scala is, by design, concise in syntax and utilizes lower resources than other languages, it is adept at both small, portable applications and large, complex systems. Object Oriented: Scala is entirely object-oriented by its very nature. Every value is an object, and every operation a method-call; all augmented by complex classes and traits allowing for advanced architecture and designs. In contrast to other languages, generics in Scala are well supported, increasing usefulness and extensibility. Precise: Scala is precise due to its strict constraints. This means that, more often than not, issues will be caught during compilation rather than after full deployment. Why Scala? Why Not Java? For many developers, the question of whether or not to use Scala or Java is one of convenience. “I know Java, and Scala is so much like Java…so why should I switch?” It’s a legitimate question, and one that has a simple answer — Scala does some things extremely well that Java does not, and in less space. Scala is, to quote Systems Engineer at Motorola Brian Tarbox, “transformational rather than procedural.” Whereas Java explains to the system how to do something, Scala offers simple steps to perform that same function without verbosity. It’s arguably cleaner, and thus simpler, while accomplishing the same thing. Take this simple comparison. Let’s create a list of strings, using standard usage language. (There are many ways to shorten Java code, but many are not supported or are not in standard usage, and thus will not be discussed here.) The Java version: List list = new ArrayList(); list.add("1"); list.add("2"); list.add("3"); list.add("4"); list.add("5"); list.add("6"); Compare this to the Scala version: val list = List("1", "2", "3", "4", "5", "6") While some may not view this as a large enough reduction in space, keep in mind that as code expands to larger and larger lengths, the importance of compactness certainly adds up. As another example, let’s create a snippet that draws from a pre-defined User class, and returns all the products that have been ordered by that user. The Java version: public List getProducts() { List products = new ArrayList(); for (Order order : orders) { products.addAll(order.getProducts()); } return products; } The Scala version: def products = orders.flatMap(o => o.products) Now that’s a huge difference — from seven lines to one. Also Check out: Building APIs on the JVM Using Kotlin and Spark Different, More Efficient Methods Reduction in complexity is certainly valuable, but there’s something more fundamental going on between Java and Scala. As a further example to demonstrate the reduction in complexity, the previously-quoted Tarbox created the following log processing method in both Java and Scala: The Java version: import java.io.BufferedReader; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.InputStreamReader; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.Scanner; import java.util.TreeMap; import java.util.Vector; void getTimeDiffGroupedByCat() { FileInputStream fstream = new FileInputStream("textfile.txt"); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; Long thisTime; HashMap lastCatTime = new HashMap(); TreeMap> catTimeDiffs = new TreeMap>(); while ((strLine = br.readLine()) != null) { Scanner scanner = new Scanner(strLine); thisTime = scanner.nextLong(); String category = scanner.next(); Long oldCatTime = lastCatTime.put(category, thisTime); if(oldCatTime != null) { if(catTimeDiffs.get(category) == null) { catTimeDiffs.put(category, new Vector()); } catTimeDiffs.get(category).add(thisTime - oldCatTime); } } for(Map.Entry> thisEntry: catTimeDiffs.entrySet()) { System.out.println("Category:" + thisEntry.getKey()); Iterator it = thisEntry.getValue().iterator(); while(it.hasNext()) { System.out.println(it.next()); if(it.hasNext()) System.out.print(", "); } } } This code creates a simple HashMap that holds the timestamps keyed, organizing it by category. These values are then compared to previous timestamps from the HashMap, returning the difference as an appended category vector of time differences in the generated TreeMap. Lines 28–36 print these results in a series of comma-separated lists. The Scala version: import Source.fromFile def getTimeDiffGroupedByCat = { val lines = fromFile("file.txt").getLines val tuppleList = for(oneLine <- lines) yield {val z = oneLine.split (' '); (z(0).toInt, z(1)) } for(groupedList <- tuppleList.toList.groupBy(oneTuple => oneTuple._2)) { val diffList = for(logPairs <- groupedList._2.sliding(2)) yield (logPairs(1)._1 - logPairs(0)._1) println(groupedList._1 + ":" + diffList.mkString(",")) } } Forgoing the fact that the Scala version is far less verbose — certainly a reason to adopt Scala alone — there is something far more important going on here. Because Scala is immutable, the handling is dealt with differently in the Scala version, transforming data in lists rather than in the mutable Java HashMaps. These lines are generated using the getLines function, which is then iterated over the list of lines, executing the code following the yield valuation. This result is then added to the tuppleList variable, which is then grouped using groupBy, which after several additional manipulations following the oneTuple.2 and logPairs(0).1 definitions, are printed. Simply put, the Scala version handles the same function with less code, in a more succinct way, and with less complexity, while allowing for immutable manipulation and typing. Scala Performance and Integration Spark is brief — it was designed to be compact, and to carry out performance with relatively small codebases. Though Java does this to a point, Scala holds this spirit in its very essence. Scala is by all accounts a language that results in smaller codebases, more efficient processing, and easier troubleshooting and design. For all its benefits, Scala can be more complex than the intended functionality. For this reason, some have shied away from it. Complexity arising from the nature of the code’s implicit functionality interactions can often make implementation of encryption, transformation, authentication, etc. more complex than it would be in Java. That being said, the end result of this complexity is counter-intuitive simplicity in the actual codebase. Once the functionality is figured out, and the function map conceived, this complexity is represented by relatively small code samples, and thus efficient processing. Conclusion Scala and Java are two wonderful languages. Like every other language, however, there are serious strengths and weaknesses to consider when utilizing them for development. While Java is the best choice for a developer who is firmly entrenched in the Java mindset or has heavy programming experience in the language, Scala is a wonderful alternative that certainly reduces complexity, while allowing for more complex manipulation. This is, of course, only part of the equation — integration of security systems and protocols, methodologies of microservice development, and the fundamental API architecture is just as important as the language it is being developed in. With a full understanding of the language chosen, and a complete visualization of the requirements inherent in your API, successful development becomes an attainable goal. Resources The following resources can help novice and experienced programmers alike find out more about Scala and Spark. Getting Started with Spark Getting Started in Scala Complete Scala Language Specification Building APIs on the JVM Using Kotlin and Spark Using Spark to Create APIs in Java The latest API insights straight to your inbox