- Q1:
Why does the code for milestones A-C use a call to Uth_create
instead of uth_create?
A1:
Uth_create is there with the assumption that you are
using wrapper functions for error handling in the case that
uth_create fails.
It should be okay if you just want to use 'uth_create' as long
as you don't change critical parts of the milestone code.
- Q2:
For Mile A what exactly should uth_exit do since we haven't
implemented uth_join yet?
A2:
uth_exit should terminate the thread; i.e., remove it from the
run/ready queue, and reclaim whatever resources it is using
although you need to retain enough state so that uth_join will work.
- Q3:
I am trying to use a queue data structure to store the threads, but am
confused about the variable type. I think i am supposed to push the
fibArg[i] in the queue every time i create a new one, but the
"worker[nworkers]" array is the global variable, while fibArg[] is
local.
A3:
You are confused. A few comments:
- worker[i] is of type uth_tid_t which uth.h shows is:
typedef int uth_tid_t;
But you can change this if you wish.
worker[] holds the thread IDs; typically 0, 1, 2, ....
- fibArg[] is an array which is local to main().
fibArg[i] is used to hold the argument(s) passed to thread i.
- I am not sure which queue you are refering to.
Do you mean the run queue?
One approach to data structures is to have an array of
N thread descriptors (includes ucontext, run state, etc)
and the run/ready queue.
Typically, the thread descriptor table is indexed by the
thread id; i.e., thread i's thread descriptor is the ith
element of this array.
There are a variety of approaches to the run/ready queue,
but in its simplest form it is a double-ended queue:
dequeue runQ;
But a dequeue of thread descriptor pointers could be used,
and runQ will likely be an array by mileB.
- Q4:
Why do you declare "worker[nworkers]" as "uth_tid_t", when it is in
fact an "int"?
A4:
uth_tid_t is a user-defined type which uth.h defines as int.
You can change it if you wish. You are asking me
"why have typedefs?" Kernighan and Ritchie say:
"Besides aesthetic issues ... parameterize agains portability
problems ... provide better documentation ...."
But the main reason I use it is so that you can easily change
its type if you want.
- Q5:
And what is the parameter n in fibArg?
A5:
It is a parameter of the for-loop in fibFiber.
- Q6:
I am having problems switching back to the main thread {tid = 1}.
I have looked at the example that we discussed in class and I
think I'm doing everything the same. I am able to switch to the
2 worker threads but it gets a segment fault when I try to switch
back to the main.
A6:
Something is wrong with the context pointer of the 'to' argument to
swapcontext. Inside uth_init, print out the ucontext address of
all of your threads:
for (int i=0; ...) {
printf("+++ thread descr context addr i = %p\n",
&threadList[i].thread);
}
Then, print them out right before you call swapcontext inside
uth_yield() and you will see that they are different.
On another matter,
I do not understand your declaration of the run queue:
deque Run_Q;
Did you really mean:
deque<struct ThreadDesc *> Run_Q;
a dequeue of pointer to thread descriptor? If you really do mean
what you wrote, then you have 2 copies of thread descriptors for
every thread: one that is part of the array:
ThreadDesc threadList[3];
and one that is part of Run_Q.
- Q7:
Say if we have 3 threads, A,B,C. A is running, B and C are in the
run queue(B in front of C). When A calls yield(), it should give
control to B. So what do we do about A? I suspect we should put
it at the end of run queue. But when B and C terminate in turns,
how do we know where to resume in A? Do we use another variable
to store the context? But problem will arise if in MileB or C we
have more then one thread calling yield().
A7:
When A yields, control should go to the thread at the front of the
run queue; i.e., thread B. Since yield calls "swapcontext(from, to)",
the current state of the yielding thread (A) will be saved in "from"
which should be the ucontext struct in A's thread descriptor.
But the arguments to swapcontext should be to the ucontext struct
that is part of the thread descriptor. So, when B terminates
(calls uth_exit), the scheduler code should be called which
eventually calls swapcontext to change context to C which is at
the front of the run queue. When C terminates, the same thing
happens except now A resumes using whatever context was saved
by swapcontext when we first switched from A to B.
- Q8:
In mileC.c, will there be a problem if we call uth_yield from
within the SIGVTALRM signal handler?
A8:
In short, no, there should be no problem.
(But you may discover that you need to do more than just call
uth_yield; e.g., some thread's sleep period has expired.)
Consider the situation where main is in the join queue and
two threads (T0 and T1) are compute-bound threads that will
run for many preemption periods. The following diagram shows
the situation before either thread has started to run (PCi is
the Program Counter for thread i; i.e., it points to the next
instruction in thread Ti):
PC0 >>> T0 { PC1 >>> T1 {
... ...
... ...
} }
Suppose T0 runs until it gets preempted. We now have this
situation:
PC0 >>> sighndlr{...}
T0 { PC1 >>> T1 {
PC0'>>> ... ...
... ...
} }
where T0's PC is pointing into the signal handler. PC0' is the
point of interruption and indicates the return point from the
signal handler (which is in T0's stack).
When T1 runs and gets interrupted, we have this situation:
PC0 >>> sighndlr{...} PC1 >>> sighndlr{...}
T0 { T1 {
PC0'>>> ... PC1'>>> ...
... ...
} }
Now, control is returned to T0's context which means that
T0 will resume inside the signal handler and eventually
return back to point PC0' to resume its computation.
- Q9:
What do you think of my debug output?
I maintain an index that indicates which thread in the queue
is the next thread.
Run Queues (uth_curtid=1):
Hi:
Lo: 1 2 3
Idle:
Sleeping:
Joining:
Run Queues (uth_curtid = 2):
Hi:
Lo: 1 2 3
Idle:
Sleeping:
Joining:
...
A9:
Your output could be improved.
The main purpose for this output is to help you quickly verify that
your implementation is behaving properly.
Intuitively, I would expect to see the following properties:
- The thread IDs rotate with each output of the run queue
because mileA just has the threads yielding to each
other.
- The current thread ID is at the front of the queue.
So, I expect the second instance of the run queues to
show the ordering 2, 3, 1.
If you are using a dequeue, this should be a simple
matter of iterating through the structure.
-
There are other things you can output, but these things can
wait until mileC (e.g., the tick counter value).
- Q10:
I am getting a segfault after I have called uth_create once but
before it calls it a second time. The strange thing is that it
looks like the loop variable i is lost ... not just its value,
but the variable itself.
A10:
This is a strange problem, but probably due to a bad pointer ...
You can run gdb, but the problem may be that the root cause
of your segfault may have occurred nowhere near the point that
you discovered that your variable "is lost."
You should methodically check all of your pointer values to see
if their values make sense.
....
It looks like you are referencing an array element out of bounds.
You have "thrd_tbl[3] ... = ..." when you have the declaration
"uth_tdesc_t thrd_tbl[2]".
Alternatively, you could have used a vector template and inserted
using thrd_tbl.at(3) instead of thrd_tbl[3] which would have
thrown an exception at the point that you tried to execute
the statement if you allocated only 2 elements.
- Q11:
I understand generally what to do, but when I think about writing
all of the code to support mileA.c, it all looks too complicated.
What should I do?
A11:
The milestones have been constructed so that each one adds some
new features. I would take mileA.c and simplify it as I described
in class. Then focus on each function sequentially as you need them.
For example, I would insert a call to exit after the call to
uth_init and try to do uth_init. Start by thinking about what
the pre-assertions and post-assertions should be. What I mean
by a post-assertion is what should be true after you execute
uth_init.
Suppose that I make the following simplifications:
- Statically allocate the thread descriptor table to avoid
dynamic storage allocation.
- Have a descriptor be just a ucontext structure.
- Allow for the main thread and one fibFiber.
So, the pre-assertions:
The post-assertions:
- nxt_td = 1 because element 0 is used for the main thread.
- The run queue will contain the thread ID 0 corresponding to
the main thread; i.e., assume "dequeue runq;".
Now, insert some printf or cout statements at the end of uth_init
that display the value of nxt_td and dump out the run queue.
Compile and run.
Do you see correct values?
Note that I didn't say anything about fixing up the ucontext
because the main thread is actually special and will get its
ucontext value when the first swapcontext is called.
And, the use of nxt_td is not going to work in general but is
a quick way to do allocation of thread descriptors.
Then, move the exit call to after the uth_create loop and repeat
the above exercise perhaps adding some more fields to the
thread descriptor structure.
There are no short cuts.
You make an initial stab at the software architecture.
Talk to someone about your ideas if possible or desired.
Try to implement part of it.
Reflect on your ideas, update the ideas and repeat for the
next feature.
- Q12:
My uth.c for Milestone A is only about 100 lines. Am I doing
something wrong?
A12:
I quick and dirty version of uth.c takes about 100 lines of code
give or take 10 lines. That's fine for this first submission.
But in the final submission of Project B,
uth.c will likely be about 400-500 lines because it should work
for an arbitrary application. For example, my uth_show_queues()
is 50 lines. Then, there are allocation routines and code that
check for error returns.
- Q13:
I am having a little trouble with milestone A. I am not sure how to
properly pass arguments to the new threads' functions. in mileA.c,
&fibArg[i] is passed to uth_create, and uth_create is responsible for
eventually calling makecontext with the new descriptor's context, a
function pointer, the number of arguments = 1, and the void *Arg,
where Arg refers to &fibArb[i]. I can access the parts of the void
*Arg from within uth_create, but when I am actually in that thread, I
get a segmentation fault when I try and access that memory address.
Is there some step that I am missing in order to get argument passing
to work correctly?
A13:
The crux of your problem is the function arg you passed into
makecontext.
To find that root cause, I did the following:
- Ran gdb with no breakpoints ==> got no useful info
since the error blew out the stack. I was hoping
that the gdb "where" command would locate exactly
the offending line of code.
- Noticed that the fault was due to a "bad instruction".
I should have suspected a bad entry point address.
- Inserted some calls to printf at key places and also
displayed the addresses of the 3 ucontext structs,
&fibFiber, and &Arg. Ran gdb with break points
==> deduced that the args to uth_XXX seemed ok.
- Inserted some printf calls into main and at the beginning
and end of every uth function
- ==> discovered that the problem occurred
after main called uth_yield
- ==> suspected that the swapcontext failed
when attempting to enter fibFiber
- ==> maybe makecontext was called incorrectly
- ==> looked carefully at how makecontext was called
and saw the "&f" which should be "f" (no ampersand)
since f is already an address.
- Corrected that one line and reran without faulting.
- Q14:
I can yield and exit processes, init and create threads but I am still
having trouble accessing the data in a void pointer. I'll ask you
more about that today -- what you said fixed my segfault didn't work
for me.
I wonder if it has something to do with using ssh to get to grid
instead of actually sitting at a terminal.
A14:
Your problem is that you shouldn't use grid because
it is a 64-bit machine and ucontext is broken on 64-bit machines.
"uname -i" indicates it is 64 bits.
SSH to hive.cec, recompile and rerun ... things will probably
just magically work.
- Q15:
I understand that I need to block and unblock interrupts by
blocking the SIGVTALRM signal (e.g., sigprocmask(SIG_BLOCK, ...)),
but when do I do that?
A15:
You are trying to prevent race condtions. Where you really
need it is when you are updating shared variables (e.g., run queue).
But you can't just arbitrarily block at the beginning of a uth_xxx
function and unblock at the end of the function because if the
interrupt is blocked and you context switch, the interrupt will
be blocked.
- Q16:
For uth_join when the requested Tid calls uth_exit, does the thread
that called uth_join immediately continue or is it placed on the
back of the Run_Q.
A16:
A joining thread is waiting for another thread to exit.
So, it should NOT be put on the run queue because it is
blocked until that thread exits.
- Q17:
uth_join also states that it is an error if any other thread has
already joined with thread Tid. Does this mean that only 1 thread
at a time can join with a particular Tid?
OR
Does it mean that a thread can only be joined with at most 1 time
(even if not thread is currently joining with it, but a thread has
joined with it in the past)?
A17:
The latter since the only way a thread can complete a join is
when the target thread calls uth_exit.
- Q18:
When we put the current thread to sleep do we need to switch to
the next thread in the run Q?
A18:
Yes, or else you waste CPU time.
- Q19:
Where in the run queue should a process be put that has called
uth_join(X) and is now ready to run (i.e., thread X has called
uth_exit()?
A19:
It is your choice: beginning or end of the run queue.
Just document your choice.
- Q20:
I think my preemption is working in mileC, but I still don't
understand why it should work ....
A20:
It looks like you took a brute force approach and disabled
interrupts upon entry to a uth_ function and reenabled them
before returning.
The key ideas are:
- If you setup the handler properly, SIGVTALRM will be
blocked when inside the signal handler.
Upon return, the action on SIGVTALRM is reset to the
original state before jumping into the handler; i.e.,
SIGVTALRM will be caught.
- When a thread is resumed, it's program counter is pointing
inside the signal handler right after the swapcontext
call and SIGVTALRM will be blocked.
When the thread returns from the signal handler, it will
return to the point of interruption with SIGVTALRM
unblocked.
- Q21:
Is the main thread a high or low priority thread?
A21:
Low.
- Q22:
When I keep shifting from one process to another, do I need to
*skip* the main process until all the childs are finished?
A22:
I don't understand what this question is about.
Do you mean semantics or implementation? I suspect you are
confused because you have not made a distinction between the
semantics and the implementation. Semantically, if main calls
uth_yield, it is still runnable but only get the CPU after all
other runnable threads have had a chance to run. Therefore,
it should be placed at the end of the run queue.
If it calls uth_join, it is blocked (no longer runnable) waiting
for a thread to call uth_exit. Although a crude implementation
could put main on the run queue but marked as not runnable,
it would make more sense to somehow avoid doing that.
- Q23:
What is the difference between an IDLE thread and SLEEPING thread?
I feel quite confused now.
I always thought a IDLE thread is a thread that is in the runQueue,
but according to the description those threads seem to be SLEEPING.
A23:
In Project B, I usually refer to THE idle thread to mean the
one thread that runs when no other thread is runnable.
In my implementation of the thread library for mileC, THE
idle thread has the following properties:
- It is created by uth_init and only runs if no other
thread can run
- It is an infinite loop.
- I use its CPU usage to compute CPU utilization in the
following way.
Let X be the accumulated CPU time used by this idle thread.
Then, the CPU utilization is given by:
U = (T - X)/T
where T is the total measurement time (#clock ticks).
- Q24:
Where is the Milestone D code?
A24:
There is none. You have to define your own mileD.
- Q25:
In mileC.cpp, the tid[ ] array is getting its values destroyed after
I call uth_yield.
Any ideas on why?
A25:
You overran the fib[ ] array; i.e., went out of array bounds.
Since the tid[ ] array was declared after the fib[ ] array, the
spill over blew out the contents of the fib[ ] array.
- Q26:
I am planning to use gettimeofday() to implement uth_sleep(x) and in
effect, say wake me up at time T+x where T is the current time.
Will that be OK?
A26:
No because gettimeofday() deals with real/actual time.
uth_sleep(20) means that you want to be woken up NO SOONER
than 20 msec of VIRTUAL TIME into the future.
If the clock period is 10 msec, your thread should not wake up
until 2 ticks into the future.
If the clock period is 100 msec, your thread should not wake up
until 1 tick into the future.
- Q27:
I implemented the uth_exit() like this:
uth_exit(){
1. notify_all_the_joined_processes;
// Place A
2. switch to next available process;
// Place B
}
A process is supposed to free all its allocated resources after
it exit(). However, assume a process has called exit(), it will
never get back to any code any more. Therefore, place B is not
reachable.
On the other hand, we also cannot free the resources in place A,
since we still need the resources to do the swapcontext().
As a result, neither place A nor place B we could free the resources,
then where should I do the free() operations?
A27:
What resources are you trying to free?
The only place you can free resources is at Place A because you
will never return after you do the context switch. But the
only resource you can free is the thread descriptor ... and you
may or may not really be freeing them
depending on how you implement the allocation of the thread
descriptors. Typically, you allocate N thread descriptors and
stack areas inside uth_init. uth_exit then should return these
to the library before context switching. So, you freeing means
return them to the thread library.
- Q28:
In uth_usage, are we suppose to increment usage counter for the
thread running when a signal interrupt occurs?
A28:
Yes.
- Q29:
We are suppose to supply the Milestone D test.
Can I just call the lock and unlock function as a test?
A29:
I would think you could do something more creative than that.
Think about what might be a good test ... maybe a variant of
a classic algorithm ... like producer-consumer, readers-writers,
etc. but where it matches our environment (e.g., make the threads
do some computation) and actually contend for a critical section.
- Q30:
I am thinking that all I have to do for lock/unlock is to block
the interrupt. What doyou think?
A30:
I am not sure what you are saying.
You are wrong, if you are planning to leave the interrupt
blocked between lock and unlock.
You block/unblock while inside either lock or unlock, but
SIGVTALRM is not blocked between a lock and an unlock.
Then there is the issue of what to do when you unlock if
there are multiple threads waiting for the lock.
- Q31:
Also, I'm working on milestone B now, and I'm wondering when
implementing the uth_join, say we have thread y calling:
uth_join(x, &result)
does CPU control goes to the next thread in front of the run queue, OR
goes directly to thread x? (If it goes directly to x, what happens to
all the threads in front of x in the run queue?)
A31:
The next thread in the run queue.
- Q32:
I'm getting really confused about milestone C. In the uth_init we are
initializing nworkers+4 = 6 threads. But we only create 2 low priority
fib threads, and 1 monitor thread. In addition to the main thread,
that's only 4 threads. So why are we initializing nworkers+4 but not
nworkers+2?
A32:
There are 2 extra thread slots ... it's overprovisioned.
- Q33:
Are we supposed to use the built in semaphore functions
within our mutex lock functions?
A33:
No.
You should implement the semantics of a mutex lock.
For example, calling uth_lock(&X) means:
- If X is locked, the caller should wait for X to be unlocked.
- Otherwise it grabs the lock.
- If there is more than one thread trying to grab the lock, then
uth_lock needs to insure that only one thread gets the
lock.
But you are not allowed to using pthread library functions or
atomic hardware synchronization instructions.
So, in order to ensure atomicity you need to
block the interrupt (SIGVTALRM).
Then, you also need state information so
that you know who has the lock and who is waiting for the lock.
- Q34:
When I block interrupts at the beginning of uth_create and unblock
at the end, the threads don't preempt.
But if I don't do that mileC runs OK.
Why?
A34:
uth_create calls getcontext.
So, if signals are blocked at the time that you call getcontext,
they will be blocked when you enter the thread and therefore
you won't get any interrupts.
- Q35:
I am getting a glibc error whenever I uncomment a function definition
that I don't even call.
What should I do?
A35:
Probably due to a gross programming error that the compiler
can't catch.
Try recompiling and running on another host to see if the
same error occurs.
Hopefully, you will segfault and then you can run gdb to find
the location of the error and trace back to the root cause.