spatial and temporal partitioning control?

6 messages Options
Embed this post
Permalink
Jean-Christophe DUBOIS

spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink
Hi,

Some more question on OKL4 principles:

So as I understand it OKL4 is providing a good spatial partitioning
configuration. On a SOC device we have a good granularity and we can indeed
assign each resource to the Cell we desire.

Now, I am a bit uneasy on how this would work on a non SOC device using for
example a PCI bus or an USB bus. Would we get the ability to assign each PCI
card/USB device to a single Cell. How should the bus enumeration/configuration
happen? Considering the PCI enumeration could change the various devices
physical addresses how would it work? How about PCI hotplug? I am not sure ...
does anybody has a view on this? Or is this a field to explore?

Still this let the temporal partitioning issue. OKL4 is mostly based on
priority. If a high priority cell goes mad, we are stuck because it will eat
up all of our CPU without any possibility for other cells to try to fix the
problem. And even if we have an even higher priority cell controlling the all
thing, how could it find that one specific cell has gone mad and all others
cells are starving on CPU resource.

Last if we are considering cells with threads of equal priority, OKL4 shall
run these various threads in round-robin mode. But all threads will
necessarily get the same scheduling time. I can't find an easy way to instruct
OKL4 that for 2 threads of the same priority, one should get twice as much CPU
time than the other one. Is there a way to allocate CPU resource  to the
various cells/threads in a fine grained way?

BTW, now that there is no more "privileged" cell (as opposed to OKL4 2.4) how
could one cell theoretically restart another cell that is crashed/has gone
mad? We can certainly reset the all system but is there a finer grained way to
do it?

Thanks

JC

_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer
Jean-Christophe DUBOIS

Re: spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink
So no opinion on this ?

JC

On Sunday 30 August 2009 18:49:11 Jean-Christophe Dubois wrote:

> Hi,
>
> Some more question on OKL4 principles:
>
> So as I understand it OKL4 is providing a good spatial partitioning
> configuration. On a SOC device we have a good granularity and we can indeed
> assign each resource to the Cell we desire.
>
> Now, I am a bit uneasy on how this would work on a non SOC device using for
> example a PCI bus or an USB bus. Would we get the ability to assign each
> PCI card/USB device to a single Cell. How should the bus
> enumeration/configuration happen? Considering the PCI enumeration could
> change the various devices physical addresses how would it work? How about
> PCI hotplug? I am not sure ... does anybody has a view on this? Or is this
> a field to explore?
>
> Still this let the temporal partitioning issue. OKL4 is mostly based on
> priority. If a high priority cell goes mad, we are stuck because it will
> eat up all of our CPU without any possibility for other cells to try to fix
> the problem. And even if we have an even higher priority cell controlling
> the all thing, how could it find that one specific cell has gone mad and
> all others cells are starving on CPU resource.
>
> Last if we are considering cells with threads of equal priority, OKL4 shall
> run these various threads in round-robin mode. But all threads will
> necessarily get the same scheduling time. I can't find an easy way to
> instruct OKL4 that for 2 threads of the same priority, one should get twice
> as much CPU time than the other one. Is there a way to allocate CPU
> resource  to the various cells/threads in a fine grained way?
>
> BTW, now that there is no more "privileged" cell (as opposed to OKL4 2.4)
> how could one cell theoretically restart another cell that is crashed/has
> gone mad? We can certainly reset the all system but is there a finer
> grained way to do it?
>
> Thanks
>
> JC
>
> _______________________________________________
> Developer mailing list
> [hidden email]
> https://lists.okl4.org/mailman/listinfo/developer



_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer
Stefan M. Petters-2

Re: spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink
Dear Jean Christophe,


Jean-Christophe Dubois wrote:

> On Sunday 30 August 2009 18:49:11 Jean-Christophe Dubois wrote:
>  
>> Now, I am a bit uneasy on how this would work on a non SOC device using for
>> example a PCI bus or an USB bus. Would we get the ability to assign each
>> PCI card/USB device to a single Cell. How should the bus
>> enumeration/configuration happen? Considering the PCI enumeration could
>> change the various devices physical addresses how would it work? How about
>> PCI hotplug? I am not sure ... does anybody has a view on this? Or is this
>> a field to explore?
>>    
I have to pass on that one.

>> Still this let the temporal partitioning issue. OKL4 is mostly based on
>> priority. If a high priority cell goes mad, we are stuck because it will
>> eat up all of our CPU without any possibility for other cells to try to fix
>> the problem.
Yes, that's the very definition of priority. In order to fix this you
might want to have a high prio thread which is periodicly checking
whether it received some alive IPC from a  lower prio task. If not, fix
it. However, this is rather crude, but if you don't trust your high prio
cells ...


>>  And even if we have an even higher priority cell controlling
>> the all thing, how could it find that one specific cell has gone mad and
>> all others cells are starving on CPU resource.
>>    
Well either you hack up the apps to send async IPCs to the watchdog(s),
but as mentioned this seems rather crude.

>> Last if we are considering cells with threads of equal priority, OKL4 shall
>> run these various threads in round-robin mode. But all threads will
>> necessarily get the same scheduling time. I can't find an easy way to
>> instruct OKL4 that for 2 threads of the same priority, one should get twice
>> as much CPU time than the other one. Is there a way to allocate CPU
>> resource  to the various cells/threads in a fine grained way?
>>    
AFAIK not in the current version. There have been efforts in that direction.
1. is work done under Scott Brandt at UCSC.
2. another which has been done at NICTA (supervised by me) on a 2.1
version of the kernel.
the code of the latter has not been released. I recently started a redo
of that on OKL4 3.0, but that has not grown to a degree where it could
be released. However, the ideas and logic behind the 2.1 version plus
one or two extensions are published.
http://www.cister.isep.ipp.pt/docs/towards+real+multi-criticality+scheduling/474/
Note, this is done from a RT perspective, but the discussion is mostly
equally valid when talking best effort tasks only.
Most concern in that is the performance loss. If one may ignore the
bursty release of stuff and reduce it to a simple runnability
requirement (i.e. deadline is equal to period) than the RT proof aspect
is not that hard, but that still leaves the EDF scheduling core. If one
is *only* concerned with shares, this can be arranged easily. Virtual
machine type hierarchical scheduling comes to mind. and when certain
assumptions can be made, the EDF queue doesn't look as bad anymore.

Note, somehow you need to find shares which add up to not more than 100%
;-) which requires some sort of analysis in particular when doing a real
system which communicates (which it almost invariably will do). There is
a few things more to it, than published, but I think that would be too
noisy for the mailing list. We can take that offline if you are still
curious.
Since it is a rather obvious problem in virtual machines I would have
expected more work on this. Maybe it's been done but not published anywhere.
 
>> BTW, now that there is no more "privileged" cell (as opposed to OKL4 2.4)
>> how could one cell theoretically restart another cell that is crashed/has
>> gone mad? We can certainly reset the all system but is there a finer
>> grained way to do it.
>>    
Here I have to pass again.

Regards,
    Stefan.
--
Stefan M. Petters
CISTER Research Unit
 
ISEP - IPP | Rua Dr. António Bernardino de Almeida 431
4200-072 Porto | Portugal
T +351 22 83 40 529 | Homepage
<http://www.cister.isep.ipp.pt/people/stefan+m%2E+petters>

_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer
Josh Matthews

Re: spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink

On Sat, Sep 5, 2009 at 5:17 PM, Stefan M. Petters <[hidden email]> wrote:
Dear Jean Christophe,

Jean-Christophe Dubois wrote:
> On Sunday 30 August 2009 18:49:11 Jean-Christophe Dubois wrote:
>
>> Now, I am a bit uneasy on how this would work on a non SOC device using for
>> example a PCI bus or an USB bus. Would we get the ability to assign each
>> PCI card/USB device to a single Cell. How should the bus
>> enumeration/configuration happen? Considering the PCI enumeration could
>> change the various devices physical addresses how would it work? How about
>> PCI hotplug? I am not sure ... does anybody has a view on this? Or is this
>> a field to explore?
>>

One solution to this would be to have a PCI or USB device cell, which has complete control over all devices hanging off the PCI card or USB bus and hands out access (using capabilities) dynamically to clients.
 
>> Still this let the temporal partitioning issue. OKL4 is mostly based on
>> priority. If a high priority cell goes mad, we are stuck because it will
>> eat up all of our CPU without any possibility for other cells to try to fix
>> the problem.
Yes, that's the very definition of priority. In order to fix this you
might want to have a high prio thread which is periodicly checking
whether it received some alive IPC from a  lower prio task. If not, fix
it. However, this is rather crude, but if you don't trust your high prio
cells ...

Indeed - and it makes me wonder why you don't trust your high prio cell; can you elaborate?
Depending on your use case, it may make me question your system design: are you sure your requirements aren't better met with a single cell with multiple protection domains, with a high priority cell manager acting as a watchdog over the processes within those protection domains?
 
>>  And even if we have an even higher priority cell controlling
>> the all thing, how could it find that one specific cell has gone mad and
>> all others cells are starving on CPU resource.
>>
<...>
>> BTW, now that there is no more "privileged" cell (as opposed to OKL4 2.4)
>> how could one cell theoretically restart another cell that is crashed/has
>> gone mad? We can certainly reset the all system but is there a finer
>> grained way to do it.
>>

The above concerns would again point me towards your requirements being met with a single-cell design with a cell manager that acts as your "privileged cell". If you can elaborate on your requirements, perhaps we can all comment further.

Kind regards,
Josh

_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer
Gernot Heiser

Re: spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Jean-Christophe DUBOIS
>>>>> On Sat, 5 Sep 2009 14:33:42 +0200, "Jean-Christophe Dubois" <[hidden email]> said:
>> Still this let the temporal partitioning issue. OKL4 is mostly based on
>> priority. If a high priority cell goes mad, we are stuck because it will
>> eat up all of our CPU without any possibility for other cells to try to fix
>> the problem. And even if we have an even higher priority cell controlling
>> the all thing, how could it find that one specific cell has gone mad and
>> all others cells are starving on CPU resource.

You have discovered the inherent limitations of priority-based
scheduling. Every priority scheme suffers from this in one form or
another.

>> Last if we are considering cells with threads of equal priority, OKL4 shall
>> run these various threads in round-robin mode. But all threads will
>> necessarily get the same scheduling time. I can't find an easy way to
>> instruct OKL4 that for 2 threads of the same priority, one should get twice
>> as much CPU time than the other one. Is there a way to allocate CPU
>> resource  to the various cells/threads in a fine grained way?

If one has twice the time slice length of the other, then it should
get twice as much time (as long as they do not get
preempted). Standard prio stuff.

Priority-based scheduling is what is used almost exclusively in the
real world (except for domains like automotive where time-triggered
approaches are used).

As Stefan pointed out, there are alternatives (such as RBED) which are
nicer in many ways (and we're looking at implementing such schemes in
OKL4) but at this time, industry by and large expects to use prios.

Gernot

_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer
Sergio Ruocco

Re: spatial and temporal partitioning control?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Jean-Christophe DUBOIS
Jean-Christophe Dubois wrote:
> Last if we are considering cells with threads of equal priority, OKL4 shall
> run these various threads in round-robin mode. But all threads will
> necessarily get the same scheduling time. I can't find an easy way to instruct
> OKL4 that for 2 threads of the same priority, one should get twice as much CPU
> time than the other one. Is there a way to allocate CPU resource  to the
> various cells/threads in a fine grained way?

Modulo OKL4 API, which may or may not allow this approach, a crude,
coarse-grained but simple and effective way to do this is to have a
master (scheduler) thread S running at high(er) priority H, and a number
of slaves (worker) threads A, B..., which implement various parts of
your applications, and are ready to run at the same (lower) priority L.

The scheduler thread starts the worker threads, and then yield()s to
them in a controlled (programmed) way. Each worker thread A, B... will
run for the duration of the timeslice (or until it blocks). Then the
control will (eventually) return to the scheduler thread, which can
yield() to another thread, or the same thread if the current one must
have an higher chance to run.

The scheduler thread can easily keep an account of how many times it
yielded to each thread, as a proxy to how much it (had a chance to) run.
In the long run, yielding to A 2x times than to B should approximate
giving A twice the amount of CPU time given to B (assuming that A,B...
are all CPU intensive, and don't block often).

Note that by having every worker thread run at the same priority allows
them to interact in IPC without penalising or favouring one over the
others. If A IPCs to B, B will run on A's timeslice (donated by S), but
eventually it will end, and the control return to S (this also implies
that until B does not get to run, it can't finish its job for A, and A
is stuck, no matter how many times S yields to A... but that's priority
inversion, a classic RT scheduling can of worms...).

Moreover, no worker thread gone mad can starve other worker threads,
again because it will be preempted after a timeslice anyway, and the
control returned to the scheduler thread S.

>
> BTW, now that there is no more "privileged" cell (as opposed to OKL4 2.4) how
> could one cell theoretically restart another cell that is crashed/has gone
> mad? We can certainly reset the all system but is there a finer grained way to
> do it?

Worker threads gone mad can be stopped by requiring them to notify the
scheduler with a watchdog signal (e.g., an async notify), emitted in
critical points of the code meaning "Hi S, I am working fine, please
yield to me again!".

If S does not receive a notify after yielding k(*) times to thread X, it
assumes that X went mad, thus stops yielding to X and takes appropriate
recovery actions, such as re-start thread X.

Ciao,

  Sergio


(*) Of course the right value of k is application-dependent, and it is
left as an exercise to the reader :-)


--

Dr. Sergio Ruocco   Research Fellow    http://www.disco.unimib.it/ruocco
mailto:[hidden email] / [hidden email]      NOMADIS Lab
phone: +39-02-6448-7914               Mobile, embedded real-time systems
skype: 'sergioruocco'  Dip. di Informatica, Sistemistica e COmunicazione
Building U14, room 1003  Università degli Studi di Milano-Bicocca, Italy


_______________________________________________
Developer mailing list
[hidden email]
https://lists.okl4.org/mailman/listinfo/developer