Exercise 1: Create a new list with all the strings from original list converted to lower case and print them out.
For this week homework we’ll start using the infamous Java 8 Streams, finally, being restricted as we were last week. The Collection interface had a new method added on the JDK 8, a default one, called stream(). It will, as its Javadoc states, “Returns a sequential Stream with this collection as its source”. The Stream interface is where the fun begins, since it allows us to invoke the high-order functions which lay at the core of all the buzz, and the fun, of functional programming. One of these functions, or methods, is map. We already talked about it a bit in our previous post, when reviewing the Scala solution for exercise 4 of the Week 1 homework. Map receives a function as a parameter, and according to its Javadocs entry: “Returns a stream consisting of the results of applying the given function to the elements of this stream”. Looking at the code: after we create the stream from our list, we map this stream with a function passed as a parameter which will ultimately make every element of the stream become lower case (since the function passed is the method reference Lambda String::toLowerCase). Mapping a stream, though, is an intermediate operation, so we still need to do a terminal operation on the stream to extract or do something with the stream values.
Our terminal operation here is collect, which “Performs a mutable reduction operation on the elements of this stream using a Collector”. In this case, the collector used (Collectors.toList()) will collect all the elements of the stream in a new List. Collectors is one of the most important type of terminal operations, and you should probably have a look at them, they are very useful.
We end up iterating through the elements of the new list, printing them out with through the method reference lambda System.out::println.
The Scala implementation of this exercise is pretty straight forward. As we said last week, Scala Collections implement directly the high-order functions, so you can map directly over a List object. We know also by now that we can pass a function directly as a parameter to the map method, and how the underscore placeholder shortens our syntax.
It’s important to state, though, that there is not such concept as terminal or intermediate operation in Scala. If you map on a List, you’ll get another List. It’ll depend on the function you’re passing to the map method if the elements contained within the List retain their type or become something else. In this case, the function we’re passing works directly on a String, transforming it into another String, but nothing really would prevent us from returning the length of every word of the List, for example, which would have given us a List[Int], rather than the initial List[String].
As I was saying map is not an intermediate operation in Scala, at least not necessarily. You can stop after the first map and capture its result in a new List. In Java 8, since the Collections interface don’t have a default implementation for these high-order functions, you need to promote them to Stream, apply the high-order functions and then downgrade or terminate the stream with any of the available terminal operations.
Exercise 2: Modify exercise 1 so that the new list only contains strings that have an odd length.
This exercise builds on top of the previous one, so we’ll only explain the addition of another intermediate operation, filter, which “Returns a stream consisting of the elements of this stream that match the given predicate”. The Predicate passed is obviously a lambda expression which will hold only for the Strings with an odd length, as required.
The Scala implementation is pretty straightforward, given that we also have the filter method in the Scala Lists. Again, this is not an intermediate operation, so in this case the order of execution does matter. After filtering the List, we will only map (hence being turned into lower case) over the elements with odd length, printing them out at the end of it.
Exercise 3: Join the second, third and forth strings of the list into a single string, where each word is separated by a hyphen (-). Print the resulting string.
Once we have our stream, it’s easy to select a particular slice of the cake, if you want, as long as the cake is big enough. The skip and limit methods will allow us to discard the first n elements of the stream and truncate its length respectively. These are both intermediate operations, at the end of which we will have, best case scenario, a stream with 3 elements on it. Before examining our terminal operation, let me say this:
- If you try to skip a number of elements bigger than the number of elements of the stream you’re operating on, you will get an empty stream.
- If you try to limit a number of elements bigger than the number of elements of the stream you’re operating on, you will get the same stream.
- If you try to skip or limit a negative number of elements, you will get an IllegalArgumentException.
After manipulating our stream according to our purposes, we’re using here the joining collector, which will append all the elements of the stream with the given delimiter. Obviously, they become a String after this.
In Scala, there is a method on the List class which is equivalent semantically to the action of skipping n1 elements of the list and limiting and skipping n2 elements after it; and this method is slice. It takes two integers as parameters, the first being the index of the first element (inclusive) that you want to have in your slice and the second being the index of the element (exclusive) next to the last that you want to have in your slice. After slicing, mkString will display all then elements of the sliced List in a string using the given separator string.
Exercise 4: Count the number of lines in the file using the BufferedReader provided.
On this exercise, there is not much to say about streams and lambdas, other than what’s going on without the print statement on line 4. The method lines() on the BufferedReader is another way of building a stream, in this case containing all the lines of the file from which the reader was opened. After obtaining this stream, all we do is the terminal operation count() on it, which will return the number of elements of the stream, which is precisely the number of lines of the file.
When it comes to the Scala version, we can use the Scala singleton object Source to read the content of a file into a BufferedSource object, which, as its Java BufferedReader counterpart, exposes a method to extract the lines of the file, being in this case called getLines(). The return of that type, though, is not a stream, since they don’t exist in Scala, but an Iterator, which holds another method to return its length, which, again, will coincide with the number of lines of the file.
Exercise 5: Using the BufferedReader to access the file, create a list of words with no duplicates contained in the file. Print the words. HINT: A regular expression, WORD_REGEXP, is already defined for your use.
If you know already the difference between map and flatMap, you can skip to the Scala solution.
We’ve seen before how useful map can be, but it may come short in a few situations. In this case, for example; and I’ll try to explain why before introducing a possible solution. In this exercise solution, we’re creating a stream in line 6, and this stream will contain all the lines in the file. We’re not asked to retrieve the list of lines in the file, but the words on it. So, somehow we need to transform the stream containing the lines of the file to a stream containing the words in the file (and avoiding repetition). We know, also, that there is a static method in the Stream interface to create a stream from a list. At least I knew, and if you didn’t, well, you know now. And we also know how to create a list of words from a line of the file (using the provided regular expression and the classic String#split() method). Well, if we map through every line in the stream, obtaining a list of words per line, and creating a new stream with it, we’ll end up with a stream of streams. I’m not sure about you, but I wouldn’t be too sure about how to collect all the words in the file after that. We need to, somehow, be able to allocate all the words in the file in a single stream, evolved from the primitive stream holding the lines of the file. This is when flatMap comes to the rescue. Its Javadoc is actually quite self-explanatory, if read slowly: “Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to each element. Each mapped stream is closed after its contents have been placed into this stream”. In our example, this is what will happen:
- We create a stream from all the lines of the file.
- For every entry of the stream, or line of the file, we split it into a list of words.
- That list of words will be grouped in a Stream, or mapped stream, as referred to by the Javadoc.
- Once that mapped stream is created, its content is transferred or place to the original, primitive or parent stream.
- The mapped stream is closed.
- The resulting stream is an aggregation of all the words that were once held by the mapped streams.
The reasons why this operation is called flatMap and not mapFlat, I can’t comprehend, since all it’s happening is a classic map that is transforming something into an aggregation or stream of some things, followed by a flattening process in which the aggregation of the results of the map is lost, and its members brought to the original stream where the operation began. Trying to explain this with words is actually way more painful than understanding it. Once you do, you’ll wonder why it took you so long.
- Slides 36 and 40 of this presentation by @crichardson explains graphically the differences between map and flatMap.
- Slides 26 to 32 of this presentation on Apache Spark by @uweseiler, also.
- @martinfowler explains it here (nice, it looks like in Clojure flatMap is called mapCat, they got the order right!).
After flatMap successfully aggregates all the words in the file, we just need to call the intermediate operation distinct() to remove duplicates and collect the results in a list, as we already know.
The Scala version also uses the flatMap method, which as map is present on the List class natively (again, streams don’t exist in Scala). The only difference, though, is the fact that, as we said, the Scala Source#getLines() returns an Iterator, and obviously this one does not have any method to remove duplicates. Iterators are not intended to know about the content they refer, but only allowing its access programmatically and sequentially. We need to transform the Iterator into some other data structure which allows us to remove duplicates. Fortunately, we can invoke the Iterator#toSeq() method, which will give us a Seq (base interface “aka traits” for sequences in Scala) in which it’s possible to call Seq#distinct().
Exercise 6: Using the BufferedReader to access the file create a list of words from the file, converted to lower-case and with duplicates removed, which is sorted by natural order. Print the contents of the list.
This post is already quite long, so I won’t say much about this, other than we transform the words to lower case mapping on the result of the flatMap (which was already a stream containing the words of the file), and we sort by natural order calling the Stream#sorted() method (from its Javadoc: “Returns a stream consisting of the elements of this stream, sorted according to natural order”).
The Scala solution does exactly the same, although to sort we’re calling Seq#sorted.
Exercise 7: Modify exercise6 so that the words are sorted by length.
I won’t say much about this either, it’s pretty much like the previous case, with the difference of the Stream#sorted(Comparator) method invoked, in this case the parameterised version. Since Comparator is a functional interface, we just use a lambda expression to implement its only method, which will order the Strings in the Stream by their length. Nothing new apart from this here.
The Scala solution does exactly the same, although to sort specifying a custom ordering, we’re calling a different method this time: Seq#sortBy.