Rundown on Everything Java

From CSE231 Wiki
Revision as of 19:52, 26 June 2017 by Benjaminchoi (talk | contribs)
Jump to navigation Jump to search

A Note About Java

Java is an object-oriented programming language and the language we will use in this course. Most of you should be familiar with Java from CSE 131 or AP Computer Science. If you are not familiar with Java, please talk to Professor Cosgrove or a TA to explain your computer science background and we can discuss what you can do to catch up.

Classes, Interfaces, and Inheritance

Java is built on classes and interfaces. To explain the concepts of classes, interfaces, and inheritance, consider an everyday object like a bicycle.

We all know that a bicycle is an object with certain characteristics and that it has the ability to perform certain actions. For example, the bicycle might have characteristics like color, weight, the number of seats, and so on. All bicycles also have a common set of actions they can perform, such as changing gears, speeding up, or breaking. In object-oriented terms, we say your individual bicycle is an instance of the class of objects we call bicycles. Any action that your bicycle can perform is described as a method and they form the object’s interface to the outside world. The interface of the bicycle is composed of abstract methods of the actions described previously, meaning empty-body methods.

In simpler terms, continuing the bicycle example, anything that can be considered a bicycle can do anything described in the interface of bicycle, Panasonic bicycles are a class of bicycles, Panasonic Model 1’s inherit from the more general class of Panasonic bicycles, and the Panasonic Model 1 you own is an instance of Panasonic Model 1’s. All bicycles must be able to perform the actions (methods) listed in the interface for bicycles. Exactly how it executes those methods is defined concretely in the class. In coder talk, we might say a Panasonic Model 1 is-a Panasonic bicycle which is-a bicycle.

Refer to Oracle's official documentation on classes, interfaces, inheritance, or a more detailed article on the wiki about interfaces for more information.

Data Races

Although parallel programming has many possible benefits, the thing we want to avoid most is a data race. A data race occurs when the following requirements are met:

  1. Two or more threads in a single process access the same data concurrently
  2. At least one of the accesses is for writing (mutating)
  3. The threads do not use locks to control access to the data

In the context of this class and Habanero, think of a data race as something that occurs when two asynchronous operations involving the same mutable variable try to access the variable, while your program makes it possible to and does mutate the variable. We want to avoid data races as it avoids the guarantee that our program will produce the results we desire and expect. Take the pseudocode example below.

int sum = 0
async {
    sum += 1
}
async {
    sum += 1
}
print sum //output could be 0, 1, or 2

We would expect the output of the above code to be 2, but it can actually be 0, 1, or 2. The code above meets our previous requirement for a data race as sum is not guarded and there are two processes that attempt to read/write to sum. As a result, the print statement could execute before either of the operations complete, after only one of them completes, or after both complete. As the outcome changes on a case by case basis, this obviously does not produce the result we expect, illustrating the negative impact of a data race.


Refer to Oracle's official documentation on data races and the relevant Rice video for more information.

Encapsulation (public and private)

In parallel programming, shared mutable data is a big problem that will lead to data races (see the section on Data Races for more information). As the name suggests, shared mutable data is just data which is public (shared) and mutable (able to be changed).

When creating a method or a variable, you have the option of making it public or private (there are other access modifiers, but we will focus on these). If the data is declared public, the associated information is visible to all other classes. If the data is declared private, the information is only visible within the class it is located.

Thus, if our goal is to avoid shared mutable data because it will cause data races, our goal is to make as much of our data as possible immutable and private. As a result, when programming in parallel, we will sometimes use seemingly inefficient processes to read/write to a variable in order to avoid shared mutable data and the data races they might carry. While the methods in the class still require locks and attention to ensure thread-safety, making methods and variables private is a good first step. Encapsulation simply refers to the concept of making instance variables private and only accessible to other classes through a group of associated methods. This is also known as data hiding and is the same as the getters/setters covered in CSE 131.

Refer to this link for more information on the topic.

Immutability (final)

When creating a variable, you have the option of declaring it final, which means that once you instantiate it, the value of the variable cannot change (it is immutable). By default, Java will assume you want to make a variable mutable (able to be changed). In parallel programming, it is beneficial to declare a variable final whenever possible. To see why, refer to the Encapsulation section.

Lists, Sets, Dictionaries/Maps, Stacks, and Queues

Lists are an ordered collection of elements. Although there are different types of lists, most lists allow for duplicate entries and contain references connecting the nodes of the list. They differ from arrays in several key ways. While an array has a set number of possible entries, lists can keep adding nodes without being limited to a number set at instantiation. Additionally, unlike arrays, lists cannot store primitives in Java, just like any other Java Collection (sets included). Thus, an int would need to be stored as Integer in order to work with lists.

Sets are generally an unordered collection of unique elements. As a result, if you were to print the contents of a set, the print statement should differ from trial to trial and no duplicates should be found. However, there are some sets, like TreeSet, which are sorted. Every time an element is added to a set, Java must first check to make sure that doing so would not create any duplicates.

Stacks are a last-in-first-out (LIFO) stack of objects while queues are generally a first-in-first-out (FIFO) collection of objects. It might be helpful to think of a Java stack as a stack of papers you keep at your desk. When you push more papers onto the stack, you simply add more to the top of the pile. When you pop a piece of paper from the stack, you take a piece of paper from the top of the pile. In other words, the last paper to be added onto the stack is the first to exit it. Conversely, think of a queue as waiting in line at a restaurant. When you enqueue (add to the queue), you go to the back of the line. However, when you dequeue (remove from the queue), you take the order of the person at the front of the line. So, the first person to enter the queue is also the first to exit it.

Dictionaries and maps are very similar in the sense that they both map keys to values. However, dictionary is considered obsolete in Java and it is standard practice to use a class that extends map, instead. In a general programming sense (not just specific to Java), however, the two terms are used interchangeably and we will also use both terms in this class. Regardless, both are made up of a series of entries, with each entry containing a key object and a value object. A map cannot contain a duplicate key and each key can only map to at most one value (although this value can be a collection, like a list). Some implementations of map (like TreeMap) are ordered while others (HashMap) are not. We will be making extensive use of maps in this course, so please take the time to familiarize yourself with them.

Refer to CSE 131's coverage of lists or Oracle's official documentation on lists, sets, dictionaries, maps, stacks, and queues for more information.

Lambdas

In Java, an interface with a single method is referred to as a functional interface. An abstract class is a class that cannot be instantiated, but can be subclassed. An abstract method is an empty-body method, which can be included in abstract classes. To use any of these things in a different class, Java coders often use anonymous inner classes, which is a way of implementing an abstract class or functional interface within another class. Lambdas are Java’s convenient avenue for tidying up these anonymous inner classes.

Before lambdas, this process was often tedious and took up a large amount of space. This is how lambdas were born in JDK 8. In the context of Habanero, we will use lambdas to deal with methods which take in HjSuspendable (a functional interface with the function run()) as an input parameter. Consider the following two snippets of code:

int mid = array.length / 2;
int[] subSums = new int[2];

finish(new HjSuspendable() {

    public void run() {

        async(new HjSuspendable() {

            public void run() {
                for (int i = 0; i < mid; i++) {
                    subSums[0] += array[i];
                }
            }

        });

        for (int j = mid; j < array.length; j++) {
            subSums[1] += array[j];
        }
    }

});
System.out.println(subSums[0] + subSums[1]);
int mid = array.length / 2;
int[] subSums = new int[2];

finish(() -> {

    async(() -> {
        for (int i = 0; i < mid; i++) {
            subSums[0] += array[i];
        }

    });

    for (int i = mid; i < array.length; i++) {
        subSums[1] += array[i];
    }

});
System.out.println(subSums[0] + subSums[1]);

Both implementations make use of anonymous inner classes, but the code on the bottom uses lambdas to make the code more concise and easier to read. Instead of having to explicitly define the inner class, it makes use of syntax to cut down on the excess code. The lambda syntax can be broken down into three parts: the argument list, the arrow token, and the body. In the example above, the argument list is empty but implied to take in a new HjSuspendable() where the () is, the arrow token follows shortly after, and the body is defined within the curly brackets. Although this is how we will use lambdas in this course, there are other ways to use them, shown in the link below!

Refer to Oracle's official documentation regarding lambdas, abstract methods/classes, or a more detailed article on this wiki about lambdas for more information!

Generics

When creating classes and interfaces, it is often convenient to be able to say that said class can take in and represent the data of several different classes. For example, a LinkedList is its own class, but it can be a list of integers, doubles, longs, Strings, or whatever else. In Java, we define this using the pointy brackets <>. So, in order to define a list of integers, we might say LinkedList<Integer>. However, how does Java know that Integer is an acceptable input? This is where generics come in.

When Java created LinkedList, it did not specify what classes were acceptable inputs, but instead defined a generic class E. E is not a defined class in Java, but it represents the plethora of other classes in Java which LinkedList can represent. See their documentation to get a better picture of what we mean.

Refer to Oracle’s official documentation regarding LinkedLists for more information.

Pointer and Variables: Behind the Scenes

Whenever you create a variable, unless that variable is a primitive, Java is actually creating a pointer (or reference) to a place in your machine’s memory where the actual object is stored. So, when you say, Dog myDog = new Dog(“Fido”), myDog would represent a pointer to the Fido dog object, not the actual object itself. Thus, when you change myDog to say, myDog = new Dog(“Rex”), you are not altering the Fido dog object at all. Instead, you are saying that myDog should now act as a pointer to the Rex dog object instead of the Fido dog object. With primitives, the variable would represent the value itself and there is no need to worry about pointers.

So, why does this matter? Well, you are already probably used to copying primitives. For example, take the code below:

int num1 = 1;
int num2 = num1;
num2++;
System.out.println(num1 + ", " + num2);
// result will be “1, 2”

When not dealing with pointers, the variables behave exactly how you would expect them to. When num2 copies num1, it takes on that value and any modifications to num2 do not affect num1. However, take this other snippet of code below:

LinkedList<Integer> list1 = new LinkedList<Integer>();
list1.add(1);
LinkedList<Integer> list2 = list1;
list2.add(2);
System.out.println(list1 + ", " + list2);
// result will be “[1, 2], [1, 2]”

When dealing with pointers, the variables behave differently. When you set list2 equal to list1, it does not copy all of the list nodes within list1 into a new variable list2. Instead, list2 represents a pointer to the same object that list1 points to. Thus, whenever you make modifications to list2, this is also reflected in list1. That is why if you want to copy all of the list nodes within list1 into a new variable list2, you must use a method like clone().

Additionally, as objects work off of pointers, using the == operator will not have the results you expect. Although it will work exactly as you would expect with primitives, pointers prevent it from working as expected with most objects. Take the snippet of code below:

LinkedList<Integer> list1 = new LinkedList<Integer>();
list1.add(1);
LinkedList<Integer> list2 = (LinkedList<Integer>) list1.clone();
System.out.println(list1 == list2);
// result will be false
System.out.println(list1.equals(list2));
// result will be true

As you can see by the output, the == operator returns false while the .equals() method returns true. This is because the == object is checking to see if list1 and list2 are the same instances (AKA they represent the same pointer), while the .equals() method checks for equality in a more formally defined way. Although .equals() by default checks for equality with the == operator, any well-designed class comes with a custom .equals() methods that check for equality the way a human might reasonably define it. However, when creating your own custom classes, you must define this method yourself. With primitives, this whole issue becomes moot as they do not work off of pointers.

Refer to this helpful Stack Overflow article or this detailed article on the topic for more information.

Primitives

There is a total of 8 primitive data types in Java and you can tell they are primitives because they are the only lowercase data types. All uppercase data types, including Strings, are Java objects, not primitives. These 8 primitive data types are byte, short, int, long, float, double, boolean, and char . Primitives are unique in several ways. For one, unlike objects, they cannot take on a null value. Rather, primitives default to a specific value. int, for example, always defaults to 0 while boolean defaults to false.

Second, they do not work off of pointers like Java objects. This concept is further explained in Pointer and Variables: Behind the Scenes.

Third, unlike objects, primitives do not have a root superclass. In Java, every object, including arrays, has Object as the root of its class hierarchy. Object also contains several methods, the most noticeable of which are clone(), equals(), hashCode(), and toString().

Refer to Oracle's official documentation regarding primitives and Object for more information.

Boxing and Unboxing

Every primitive has an object equivalent, which you may have noticed already. For example, int has Integer and double has Double. Boxing and unboxing are Java’s ways of automatically converting from primitives to their object equivalents and vice versa. It serves no other purpose than to make our lives easier when working with the object versions of primitives. Thanks to this useful trick, we can do things like:

Integer i = 1; Rather than: Integer i = new Integer(1);

Refer to Oracle’s official documentation regarding the topic for more information.