Difference between revisions of "FAQ"

From CSE231 Wiki
Jump to navigation Jump to search
(reformatted)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
We have gathered the most frequently asked questions on Piazza and the Q&S forms on this wiki for your convenience.
+
We have gathered the most frequently asked questions on Piazza and the S&Q forms on this Wiki for your convenience. Look through the contents table to find a specific question.
  
=Unit 1: Task-level Parallelism (Async/Finish)=
+
=General=
 +
===What is the difference between Parallel and Concurrent computing?===
 +
''Parallel programming is about using additional computational resources to produce an answer faster.''
 +
''Concurrent programming is about correctly and efficiently controlling access by multiple threads to shared resources.''
 +
:– [https://homes.cs.washington.edu/~djg/ Prof. Grossman]
  
*For the demo, he uses a threshold to determine whether or not to use the parallel version. How does he decide that threshold? Is it machine dependent? Or algorithm dependent?
+
''Concurrency is about dealing with a lot of things at once and parallelism is about doing a lot of things at once.''
**The threshold is often specified as an input parameter in a parallel method. The user often sets the value depending on the algorithm based on the overhead of starting a parallel process.
+
:– Rob Pike ('Concurrency Is Not Parallelism')
*Is async and finish features of the Java language or specific to this class?
 
**Async and finish are features of Habanero Java, a tool built by Rice University. It is not native to Java but we will use it in this class for its simplicity.
 
*When doing Quicksort, why is it more efficient to place the finish inside the main method?
 
**When performing recursive methods in parallel, it is most efficient to place the finish around the initial call as you want the subsequent inner calls to run asynchronously. Thus, you should wrap the async around the inner recursive call and the finish around the initial outer call.
 
*Are we assuming that the core of the CPU is infinite?
 
**When dealing with theory, we will assume it is infinite unless otherwise specified. When coding, we will take hybrid approaches where we will sometimes assume it is infinite and sometimes code realistically to adjust for the finite number of cores.
 
*In the videos and quizzes, we assumed finish and asyncs added no work to the computation graph. In reality, is there some amount of work needed for them?
 
**Yes, that work is what we are referring to when we talk about overhead. In order to account for this, you should only run a program in parallel if the amount of work overshadows the overhead costs.
 
*How does concurrency affect timing with a single core?
 
**With a single core, it becomes impossible to launch programs in parallel, but concurrent programming is still possible. Under such a scenario, whatever you async'd will still execute as separate threads, but it will simply run on the single core. Thus, the order in which the threads execute is still not guaranteed, meaning it will not run like sequential code. For a more detailed explanation, refer to [https://stackoverflow.com/questions/1897993/what-is-the-difference-between-concurrent-programming-and-parallel-programming this] article.
 
  
=Unit 2: Functional Parallelism and Determinism (Futures/Accumulators/Data Races/Memoization)=
+
=Errors=
 +
===How do I fix Maven when I encounter java.lang.NoClassDefFoundError and/or java.lang.ClassNotFoundError?===
 +
====On Mac====
 +
# Close Eclipse
 +
# Launch Terminal
 +
# mv ~/.m2/repository ~/.m2/dead
 +
# Open Eclipse
 +
# Type Alt+F5 (note may need to press function (fn) key to trigger F5)
 +
# Project Menu -> Clean...
 +
====On Windows====
 +
# Close Eclipse
 +
# Launch Command Prompt
 +
# cd .m2
 +
# rename repository dead
 +
# Open Eclipse
 +
# Type Alt+F5 (note may need to press function key to trigger F5)
 +
# Project Menu -> Clean...
  
*Why is it recommended to use immutable objects as far as possible in order to avoid data races?
+
===What does this error mean and how do I fix it?===
**By the very definition of a data race, immutable objects avoid data races as they prevent threads from writing to them. Refer to the [http://classes.engineering.wustl.edu/cse231/core/index.php/Rundown_on_Everything_Java#Data_Races wiki post] on data races to get a better understanding of the definition of a data race.
+
First, look at the name or type of error. Some errors will indicate what the issue is and where it occurs (e.g. java.lang.ArrayIndexOutOfBoundsException means one of the lines of code attempts to access an index that is non-existent). If the cause of the error is more difficult to discern, try going into the debugging mode. Take note of the variables and how they compare what they are to what you would have expected them to be. Finally, if you are still unsure about how to fix an error, ask the professor and TAs for help! 
*How are you able to tell when there is a data race in your code?
 
**A good rule of thumb is to look for shared mutable data in your code. Whenever you write to data in an async block, make sure that data you write to is adequately protected.
 
*Can we use the presence of functional and structural determinism to prove that a program is data race free? I understand that being DRF implies functional and structural determinism, but is the reverse also true?
 
**The reverse is not true and DRF actually does not guarantee determinism. For a counter example, examine the following pseudocode:
 
<nowiki>AtomicIntegerArray array = new AtomicIntegerArray(3);
 
async(() -> array.set(0, 1));
 
async(() -> array.set(1, 2));
 
array.set(2, 3);
 
System.out.println(array.toString()); //would expect 1, 2, 3 but the ordering might actually be different</nowiki>
 
*Can a program have functional determinism but not structural determinism and vice versa?
 
**Yes and a program can also not have either.
 
*If a program lacks structural determinism but has functional determinism, is that "bad"? Should you always try to give it structural determinism?
 
**Not necessarily, sometimes it does not matter. [[Floodfill]] is a good example of this.
 
*When is async and finish better than futures?
 
**There are several cases when async/finish would be more appropriate than futures, but generally, anything that would not involve returning a value.
 
*What's the difference between memoization and dynamic programming?
 
**Memoization is the process of storing the value of an expensive calculation in order to avoid doing the calculation again. Dynamic programming, on the other hand, is the process of breaking up a large problem into smaller, more manageable subproblems (this can, but does not have to, include memoization).
 
*Are there ever situations where it would be better to run a function twice rather than utilizing memoization?
 
**The purpose of memoization is to balance performance and storage. If the calculation is not taxing but produces a variable that will eat up a lot of storage, it might be better not to store the value if it is only an intermediary variable.
 
  
=Unit 3: Loop-level Parallelism (Forall/Barriers)=
+
=Async/Finish=
 +
===For the demo, he uses a threshold to determine whether or not to use the parallel version. How does he decide that threshold? Is it machine dependent? Or algorithm dependent?===
 +
The threshold is often specified as an input parameter in a parallel method. The user often sets the value depending on the algorithm based on the overhead of starting a parallel process.
 +
===Is async and finish features of the Java language or specific to this class?===
 +
Async and finish are features of Habanero Java, a tool built by Rice University. We have re-purposed it for our own needs in what we call V5.<b> Neither are native to Java</b> but we will use it in this class for its simplicity. We do cover some parallel/concurrent tools that are native to Java later in the semester.
 +
===When doing Quicksort, why is it more efficient to place the finish inside the main method?===
 +
When performing recursive methods in parallel, it is most efficient to place the finish around the initial call as you want the subsequent inner calls to run asynchronously. Thus, you should wrap the async around the inner recursive call and the finish around the initial outer call.
 +
===Are we assuming that the core of the CPU is infinite?===
 +
When dealing with theory, we will assume it is infinite unless otherwise specified. When coding, we will take hybrid approaches where we will sometimes assume it is infinite and sometimes code realistically to adjust for the finite number of cores.
 +
===In the videos and quizzes, we assumed finish and asyncs added no work to the computation graph. In reality, is there some amount of work needed for them?===
 +
Yes, that work is what we are referring to when we talk about overhead. In order to account for this, you should only run a program in parallel if the amount of work overshadows the overhead costs.
 +
===How does concurrency affect timing with a single core?===
 +
With a single core, it becomes impossible to launch programs in parallel, but concurrent programming is still possible. Under such a scenario, whatever you async'd will still execute as separate threads, but it will simply run on the single core. Thus, the order in which the threads execute is still not guaranteed, meaning it will not run like sequential code. For a more detailed explanation, refer to [https://stackoverflow.com/questions/1897993/what-is-the-difference-between-concurrent-programming-and-parallel-programming this] article.
  
*How can you determine if forall is better or finish-async, or are they functionally the same?
+
=Futures/Accumulators/Data Races/Memoization=
**A forall loop is functionally the same as putting a regular for loop in between a finish and an async. It is up to you which you want to use, but the forall loop serves the purpose of making your code more readable.
+
===Why is it recommended to use immutable objects as far as possible in order to avoid data races?===
*Can barriers be used in regular asyncs, or do they have to be parallel loops (like forall)?
+
By the very definition of a data race, immutable objects avoid data races as they prevent threads from writing to them. Refer to the [http://classes.engineering.wustl.edu/cse231/core/index.php/Rundown_on_Everything_Java#Data_Races wiki post] on data races to get a better understanding of the definition of a data race.
**Barriers can be used wherever you put them, but are especially useful for parallel loops.
+
===How are you able to tell when there is a data race in your code?===
*Is a barrier simply analogous to a finish-block within a forall loop?
+
A good rule of thumb is to look for shared mutable data in your code. Whenever you write to data in an async block, make sure that data you write to is adequately protected.
**No, it would be more accurate to think of a barrier as a type of phaser with an implicit wait command.
+
===Can we use the presence of functional and structural determinism to prove that a program is data race free? I understand that being DRF implies functional and structural determinism, but is the reverse also true?===
 +
The reverse is not true and DRF actually does not guarantee determinism. This is a complicated topic, but the constructs introduced in Unit 5 will break the "DRF implies determinism" guarantee.
 +
===Can a program have functional determinism but not structural determinism and vice versa?===
 +
Yes and a program can also have neither.
 +
===If a program lacks structural determinism but has functional determinism, is that "bad"? Should you always try to give it structural determinism?===
 +
Not necessarily, sometimes it does not matter. [[Floodfill]] is a good example of this.
 +
===When is async and finish better than futures?===
 +
There are several cases when async/finish would be more appropriate than futures, but generally, anything that would not involve returning a value.
 +
===What's the difference between memoization and dynamic programming?===
 +
Memoization is the process of storing the value of an expensive calculation in order to avoid doing the calculation again. Dynamic programming, on the other hand, is the process of breaking up a large problem into smaller, more manageable subproblems (this can, but does not have to, include memoization).
 +
===Are there ever situations where it would be better to run a function twice rather than utilizing memoization?===
 +
The purpose of memoization is to balance performance and storage. If the calculation is not taxing but produces a variable that will eat up a lot of storage, it might be better not to store the value if it is only an intermediary variable.
  
=Unit 4: Dataflow Synchronization and Pipelining (Phasers/Deadlock)=
+
=Forall/Barriers=
 +
===How can you determine if forall is better or finish-async, or are they functionally the same?===
 +
A forall loop is functionally the same as putting a regular for loop in between a finish and an async. It is up to you which you want to use, but the forall loop serves the purpose of making your code more readable.
 +
===Can barriers be used in regular asyncs, or do they have to be parallel loops (like forall)?===
 +
Barriers can be used wherever you put them, but are especially useful for parallel loops.
 +
===Is a barrier simply analogous to a finish-block within a forall loop?===
 +
No, it would be more accurate to think of a barrier as a type of phaser with an implicit wait command.
  
*How would you recreate fuzzy barriers with asyncs and finishes?
+
=Phasers/Deadlock=
**You cannot, fuzzy barriers create previously impossible computation graphs.
+
===How would you recreate fuzzy barriers with asyncs and finishes?===
*Is it possible for two phasers to be dependent on each other, thereby breaking the program?
+
You cannot, fuzzy barriers create previously impossible computation graphs.
**Yes, this is called a deadlock.
+
===Is it possible for two phasers to be dependent on each other, thereby breaking the program?===
*What kind of error will be thrown if we enter deadlock? Or would the program never terminate?
+
Yes, this is called a deadlock.
**Unless you explicitly define an error that would appear after a set amount of time, the program would just never terminate.
+
===What kind of error will be thrown if we enter deadlock? Or would the program never terminate?===
*Why use await instead of normal futures?
+
Unless you explicitly define an error that would appear after a set amount of time, the program would just never terminate.
**Either one works, but you would need to use await if you wanted to execute task B after A without using the value generated from A or if A does not return a value (futures return values).
+
===Why use await instead of normal futures?===
 +
Either one works, but you would need to use await if you wanted to execute task B after A without using the value generated from A or if A does not return a value (futures return values).
  
=Unit 5: Critical Sections and the Isolated Construct (Isolated/Synchronized/Locks/Atomics)=
+
=Isolated/Synchronized/Locks/Atomics=
 
+
===What are locks and semaphores?===
*What are locks and semaphores?
+
Locks prevent a protected piece of code from running in parallel, forcing threads to access data one after the other. Each class has its own intrinsic lock and locks can also be set explicitly. Semaphores restrict threads from accessing a protected piece of data, but unlike locks, [https://stackoverflow.com/questions/34519/what-is-a-semaphore semaphores can allow more than one consumer access to a specific piece of data.]
**Locks prevent a protected piece of code from running in parallel, forcing threads to access data one after the other. Each class has its own intrinsic lock and locks can also be set explicitly. Semaphores restrict threads from accessing a protected piece of data, but unlike locks, [https://stackoverflow.com/questions/34519/what-is-a-semaphore semaphores can allow more than one consumer access to a specific piece of data.]
+
===What is the difference between isolated and synchronized?===
*What is the difference between isolated and synchronized?
+
They serve the same functional purpose, but isolated has the added perk of being able to specify read/write access with object-based isolation. You can use whichever you prefer.
**They serve the same functional purpose, but isolated has the added perk of being able to specify read/write access with object-based isolation. You can use whichever you prefer.
+
===Does isolation naturally take up more resources? That is to say, in cases where data race is not a problem, would there be no reason to use isolation?===
*Does isolation naturally take up more resources? That is to say, in cases where data race is not a problem, would there be no reason to use isolation?
+
Isolated is only used to protect shared mutable data, so it would be useless to call it when a program is already DRF.
**Isolated is only used to protect shared mutable data, so it would be useless to call it when a program is already DRF.
+
===Are there such things as atomic strings or even data structures?===
*Are there such things as atomic strings or even data structures?
+
Yes, there are several protected data structures built into Java. Check out [https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html java.util.concurrent] for some more examples!
**Yes, there are several protected data structures built into Java. Check out [https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html java.util.concurrent] for some more examples!
+
===Are atomic variables just a shorthand for object-based isolated constructs?===
*Are atomic variables just a shorthand for object-based isolated constructs?
+
You can think of them that way, but many of them are specifically optimized so that they often surpass simple object-based isolation in performance.
**You can think of them that way, but many of them are specifically optimized so that they often surpass simple object-based isolation in performance.
+
===Can we go over the difference between Atomic Integers and finish accumulators?===
*Can we go over the difference between Atomic Integers and finish accumulators?
+
You can choose to use them the exact same way if you would like, but atomic integers are slightly more versatile as they have more options for operations.
**You can choose to use them the exact same way if you would like, but atomic integers are slightly more versatile as they have more options for operations.
+
===How do we quantitatively define low, medium, and high contention?===
*How do we quantitatively define low, medium, and high contention?
+
This is something you will just have to feel out. If you are only accessing one piece of shared mutable data, that might be low contention. If, however, you are reading and writing to multiple variables in parallel to the point that the parallel implementation performs worse than a sequential approach, this would be high contention. You will become better at judging this as the semester goes on.
**This is something you will just have to feel out. If you are only accessing one piece of shared mutable data, that might be low contention. If, however, you are reading and writing to multiple variables in parallel to the point that the parallel implementation performs worse than a sequential approach, this would be high contention. You will become better at judging this as the semester goes on.
 

Latest revision as of 04:14, 31 December 2022

We have gathered the most frequently asked questions on Piazza and the S&Q forms on this Wiki for your convenience. Look through the contents table to find a specific question.

Contents

General

What is the difference between Parallel and Concurrent computing?

Parallel programming is about using additional computational resources to produce an answer faster. Concurrent programming is about correctly and efficiently controlling access by multiple threads to shared resources.

Prof. Grossman

Concurrency is about dealing with a lot of things at once and parallelism is about doing a lot of things at once.

– Rob Pike ('Concurrency Is Not Parallelism')

Errors

How do I fix Maven when I encounter java.lang.NoClassDefFoundError and/or java.lang.ClassNotFoundError?

On Mac

  1. Close Eclipse
  2. Launch Terminal
  3. mv ~/.m2/repository ~/.m2/dead
  4. Open Eclipse
  5. Type Alt+F5 (note may need to press function (fn) key to trigger F5)
  6. Project Menu -> Clean...

On Windows

  1. Close Eclipse
  2. Launch Command Prompt
  3. cd .m2
  4. rename repository dead
  5. Open Eclipse
  6. Type Alt+F5 (note may need to press function key to trigger F5)
  7. Project Menu -> Clean...

What does this error mean and how do I fix it?

First, look at the name or type of error. Some errors will indicate what the issue is and where it occurs (e.g. java.lang.ArrayIndexOutOfBoundsException means one of the lines of code attempts to access an index that is non-existent). If the cause of the error is more difficult to discern, try going into the debugging mode. Take note of the variables and how they compare what they are to what you would have expected them to be. Finally, if you are still unsure about how to fix an error, ask the professor and TAs for help!

Async/Finish

For the demo, he uses a threshold to determine whether or not to use the parallel version. How does he decide that threshold? Is it machine dependent? Or algorithm dependent?

The threshold is often specified as an input parameter in a parallel method. The user often sets the value depending on the algorithm based on the overhead of starting a parallel process.

Is async and finish features of the Java language or specific to this class?

Async and finish are features of Habanero Java, a tool built by Rice University. We have re-purposed it for our own needs in what we call V5. Neither are native to Java but we will use it in this class for its simplicity. We do cover some parallel/concurrent tools that are native to Java later in the semester.

When doing Quicksort, why is it more efficient to place the finish inside the main method?

When performing recursive methods in parallel, it is most efficient to place the finish around the initial call as you want the subsequent inner calls to run asynchronously. Thus, you should wrap the async around the inner recursive call and the finish around the initial outer call.

Are we assuming that the core of the CPU is infinite?

When dealing with theory, we will assume it is infinite unless otherwise specified. When coding, we will take hybrid approaches where we will sometimes assume it is infinite and sometimes code realistically to adjust for the finite number of cores.

In the videos and quizzes, we assumed finish and asyncs added no work to the computation graph. In reality, is there some amount of work needed for them?

Yes, that work is what we are referring to when we talk about overhead. In order to account for this, you should only run a program in parallel if the amount of work overshadows the overhead costs.

How does concurrency affect timing with a single core?

With a single core, it becomes impossible to launch programs in parallel, but concurrent programming is still possible. Under such a scenario, whatever you async'd will still execute as separate threads, but it will simply run on the single core. Thus, the order in which the threads execute is still not guaranteed, meaning it will not run like sequential code. For a more detailed explanation, refer to this article.

Futures/Accumulators/Data Races/Memoization

Why is it recommended to use immutable objects as far as possible in order to avoid data races?

By the very definition of a data race, immutable objects avoid data races as they prevent threads from writing to them. Refer to the wiki post on data races to get a better understanding of the definition of a data race.

How are you able to tell when there is a data race in your code?

A good rule of thumb is to look for shared mutable data in your code. Whenever you write to data in an async block, make sure that data you write to is adequately protected.

Can we use the presence of functional and structural determinism to prove that a program is data race free? I understand that being DRF implies functional and structural determinism, but is the reverse also true?

The reverse is not true and DRF actually does not guarantee determinism. This is a complicated topic, but the constructs introduced in Unit 5 will break the "DRF implies determinism" guarantee.

Can a program have functional determinism but not structural determinism and vice versa?

Yes and a program can also have neither.

If a program lacks structural determinism but has functional determinism, is that "bad"? Should you always try to give it structural determinism?

Not necessarily, sometimes it does not matter. Floodfill is a good example of this.

When is async and finish better than futures?

There are several cases when async/finish would be more appropriate than futures, but generally, anything that would not involve returning a value.

What's the difference between memoization and dynamic programming?

Memoization is the process of storing the value of an expensive calculation in order to avoid doing the calculation again. Dynamic programming, on the other hand, is the process of breaking up a large problem into smaller, more manageable subproblems (this can, but does not have to, include memoization).

Are there ever situations where it would be better to run a function twice rather than utilizing memoization?

The purpose of memoization is to balance performance and storage. If the calculation is not taxing but produces a variable that will eat up a lot of storage, it might be better not to store the value if it is only an intermediary variable.

Forall/Barriers

How can you determine if forall is better or finish-async, or are they functionally the same?

A forall loop is functionally the same as putting a regular for loop in between a finish and an async. It is up to you which you want to use, but the forall loop serves the purpose of making your code more readable.

Can barriers be used in regular asyncs, or do they have to be parallel loops (like forall)?

Barriers can be used wherever you put them, but are especially useful for parallel loops.

Is a barrier simply analogous to a finish-block within a forall loop?

No, it would be more accurate to think of a barrier as a type of phaser with an implicit wait command.

Phasers/Deadlock

How would you recreate fuzzy barriers with asyncs and finishes?

You cannot, fuzzy barriers create previously impossible computation graphs.

Is it possible for two phasers to be dependent on each other, thereby breaking the program?

Yes, this is called a deadlock.

What kind of error will be thrown if we enter deadlock? Or would the program never terminate?

Unless you explicitly define an error that would appear after a set amount of time, the program would just never terminate.

Why use await instead of normal futures?

Either one works, but you would need to use await if you wanted to execute task B after A without using the value generated from A or if A does not return a value (futures return values).

Isolated/Synchronized/Locks/Atomics

What are locks and semaphores?

Locks prevent a protected piece of code from running in parallel, forcing threads to access data one after the other. Each class has its own intrinsic lock and locks can also be set explicitly. Semaphores restrict threads from accessing a protected piece of data, but unlike locks, semaphores can allow more than one consumer access to a specific piece of data.

What is the difference between isolated and synchronized?

They serve the same functional purpose, but isolated has the added perk of being able to specify read/write access with object-based isolation. You can use whichever you prefer.

Does isolation naturally take up more resources? That is to say, in cases where data race is not a problem, would there be no reason to use isolation?

Isolated is only used to protect shared mutable data, so it would be useless to call it when a program is already DRF.

Are there such things as atomic strings or even data structures?

Yes, there are several protected data structures built into Java. Check out java.util.concurrent for some more examples!

Are atomic variables just a shorthand for object-based isolated constructs?

You can think of them that way, but many of them are specifically optimized so that they often surpass simple object-based isolation in performance.

Can we go over the difference between Atomic Integers and finish accumulators?

You can choose to use them the exact same way if you would like, but atomic integers are slightly more versatile as they have more options for operations.

How do we quantitatively define low, medium, and high contention?

This is something you will just have to feel out. If you are only accessing one piece of shared mutable data, that might be low contention. If, however, you are reading and writing to multiple variables in parallel to the point that the parallel implementation performs worse than a sequential approach, this would be high contention. You will become better at judging this as the semester goes on.