운영체제 4 - Scheduling [CPU Virtualization]

티스토리 뷰

코딩 관련/OS

운영체제 4 - Scheduling [CPU Virtualization]

susuhana 2021. 6. 11. 17:59

728x90

지난 시간에 배운 이야기

핵심: CPU를 어떻게 Virtualization을 하느냐?
방법: Time-Sharing을 한다.
조그만한 Time slot을 나눈다 → 각각의 프로세스에 Time slot을 하나씩 배정한다
Process가 Time Slot을 다 쓰고 안 내놓는 가능성이 있었다. ⇒ 초창기 Cooperative Approach 방식
이 방식은 Prcocess가 자신의 load를 다 끝날 때까지 CPU를 손에 잡고 있거나 처리 시간이 너무 오래 걸린다면 알아서 CPU를 내놓을 것이라고 기대했었다.

Non-Cooperative Approach: 하드웨어 타이머를 사용하기로함
하드웨어 타이머가 완료되었을 때 찾아가야하는 트랩핸들러를 부팅 과정에서 기록을 해놓는다. 그리고 타이머가 완료되면은 운영체제한테 제어권을 넘겨준다. 그러면 운영체제는 새로운 프로세스를 선정해서 CPU를 넘겨준다.
이 과정은 Context-Switching을 하는 기본 machism이 된다.

이 장에서는 Process를 어떻게 선정할지에 대해 배우게 된다.

Workload Assumption

Workload ← 일련의 프로세스들이 실행하는 상황

이 Assumption은 우리의 이해를 돕기 위해 설정 된 것이다.

Each job runs for the same amount of time.
All jobs arrive at the same time.
Once started, each job runs to completion.
All jobs only use the CPU(i.e., they perform no I/O
The run-time of each job is Known.

→ 앞으로 하나씩 가정을 완화하면서 실제 Scheduling에 다가간다.

Scheduling metrics

Turnaround time → 반환 시간 → 성능 측면의 평가 기준

Policies

First In, Firsst Out(FIFO)

FIFO scheduling or FCFS(First Come, First Served) scheduling.
It is simple and easy to implement.?

Example: Imagine three jobs(A,B,C) arrives in the system at roughly the same time.
but A arrived faster than B, and B arrived faster than C. Assume each job runs for 10 seconds.
What will the average turn-around time be for these jobs?

T_A =10, T_B = 20, T_C = 30; (A+B+C)/3 = (60)/3 = 20; Average Turnaround = 20;

Example: Let's again assume three jobs but this time A runs for 100 seconds while B and C run for 10 seconds. ← workload의 1번 Assumption을 완화시켰다. FIFO에서 성능이 구리다...!!!!
workload assumption 1 → Each job runs for the same amount of time.

A = 100, B = 110, C = 120, (A+B+C)/3 = 110 << - 😫 Convy effect 현상이 발생!! 어떻게 해결할까?

GeeksforGeeks Convey

Convoy effect: a number of relatively-short potential consumers of a resource get queued behind a heavyweight resource consumer.

Shortest job first(SJT)

FIFO의 문제(Convoy effect)를 해결해보자.
It runs the shortest job first, then the next shortest, and so on.

Q: What is the average turn-arround time for the above case?

B = 10, C = 20, A = 120 , 150/3 =50이다.

Example: Assume Arrives at t = 0 and needs to run for 100 seconds, whereas B and C arrive at t =10 and each needs to run for 10seconds. What is the average turn-around time?

↑ 이제 workload의 Assumption 2번을 완화한다. A가 B와 C보다 먼저 도착하면 성능이? 구리다.
workload assumption 2 → All jobs arrive at the same time.

A = 100, B= 110-10 = 100, C = 120-10 = 110, Avearge Turn-Arround time = 103.33; ← 😫

Shortest-Time-to-Completion First(STCF)

이제는 실행 시간이 긴 A가 먼저 도착했을 때 어떻게 해결할까? SJF의 문제를 해결해보자.

Workload Assumption 3번을 완하해보자.

workload assumption 3 → Once started, each job runs to completion.

Additional machinery: when another job arrives, the scheduler can preempt the currently running job and decide to run another job.
STCF or Preemptive Shortest Job Frist(PSJF): SJT + preemption.

막간 지식

Non-Preemptive Scheduling
한 프로세스가 CPU를 할당받아 실행 중이면 다른 프로세스들이 강제적으로 뺏을 수 없는 스케줄링 방식
예시: FIFO(FCFS), SJF, HRN 등

Preemptive Scheduling
한 프로세스가 CPU를 할당받아 실행중이라도 다른 프로세스가 현재 프로세스를 중지시키고 CPU를 강제적으로 뺏을 수 있는 스케줄링 방식
예시: RR, SRT, Multi Level Queue, Multi Level Feedback Queue 등

선점 기법, 비선점 기법

▲ 좀 더 알아보기: 두 개의 장단점 비교가 잘 되어있는 사이트

Any time a new job enters the system, the STCF scheduler determines which of the remaining jobs (including the new job) has the least time left, and schedules that one.

A = 120-0 B = 20-10 C = 30-10 150→ 50 평균 값

A new metric: response time

Response time 정의: the time form when a job arrives in a system to the first time it is scheduled.

작업의 길이를 미리 알고 있고, 작업이 오직 CPU만 사용하며 평가 기준이 Turnaround Time하나라면 STCF는 매우 훌륭한 정책이다.
Time-sharing computer의 등장하고 사용자는 terminal에서 작업하여 상호작용이 원활하기 위한 성능이 요구된다. 그래서 Respon Time이 등장한다.

American Robin

American Robin 울음 소리 듣기
https://upload.wikimedia.org/wikipedia/commons/transcoded/d/d7/Robin_sings.ogv/Robin_sings.ogv.240p.vp9.webm

Round Robin

기본 발상: Instead of running jobs to completion, RR runs a job for a time slice(or scheduling quantum) and then switches to the next job in the run queue.
RR is sometimes called Time-Slicing.

위의 예의 RR의 A,B,C가 시스템에 동시에 도착하고 각각 5초간 실행된다고 가정하자. RR의 average response time은 (0+1+2)/3 = 1이고, SJF average respone time은 (0+5+10)/3 = 5이다.

Time Slice는 짧아질 수록 Response time 기준으로 RR이 성능은 더 좋아진다. time slice가 짧아질 수록 The cost of switching이 전체 성능에 큰 영향을 미치게된다.

The cost of switching: OS actions of saving and restoring a few registers, state information in CPU caches, TLBs (translation lookaside buffer), branch predictors, and other on-chip hardware.
Switching to another job causes the current state to be flushed and new state relevant to the currently-running job to be brought in.
What is the turn-around time of RR?

앞의 예를 다시 보자. A, B, C 각 작업은 5초간 실행되고 동시에 도착하며, RR은 1초의 time slice를 가진다. A는 13, B는 14, C는 15에 종료하고 평균 14의 turn-around time을 가진다. 매우 안좋다.
RR은 각 작업을 잠깐 실행하고 다음 작업으로 넘어가고 하면서 가능한 한 각 작업을 늘리는 것이 목표다. turn-around time은 완료 시간만 고려하기 때문에 RR은 거의 최악이다. 단순한 FIFO보다 성능이 좋지 않게 된다.
Turnaround time vs response time. RR은 공정하다.
SJF, STCF은 Turn-around time에 최적화하지만 respone time은 나쁘다.
RR은 respone time을 최적화하지만 turn-around time은 나쁘다.
아직 workload assumption 4와 5가 남았다.

4 . All jobs only use the CPU(i.e., they perform no I/O )

5. The run-time of each job is Known.

Incorporatin I/O

workload assumption 4 → All jobs only use the CPU를 완화해보자.

A scheduler clearly has a decision to make when a job initiates an I/O request because the currently-running job won't be using the CPU during the I/O; it is waiting for I/O completion.
The scheduler also has to make a decision when the I/O completes. when that occurs, an interrupt is raised, and the OS runs and moves the process that issued the I/O from blocked back to the ready state. It could decide to run the job at that point.

Example: A와 B가 있다고 하자. 각 작업은 50msec의 CPU 시간을 필요로 한다. A는 10mesc동안 실행한 후 I/O 요청을 한다.(I/O 시간은 10msec라고 가정한다.) B는 I/O 작업이 없다.
Scheduler는 A를 먼저 실행시키고, B를 다음에 실행시킨다.

STCF Scheduler를 구축하려고 가정해보고 A가 5개의 10msec 작업으로 분할. B는 하나의 50msec의 CPU를 요청한다는 사실을 어떻게 활용할까?

일반적인 접근 방식은 A는 각 10msec 하위 작업을 독립적인 작업으로 다루는 것이다. 시스템이 시작할 때 10msec 작업들과 50msec를 스케줄하는 것이다. STCF의 경우 가장 짧은 작업을 선택한다. A다. A를 실행하고 B를 선점하여 10msec동안 실행한다. 입출력이 끝나기를 기다리는 동안 CPU는 다른 프로세스에 의해 사용되어 연산의 중첩이 가능해진다.

Overlap(중첩)은 높은 이용률을 가능하게 한다. 가능하면 시스템을 활용도를 극대화하기 위해 연산을 Overlap되게 시랭한다.

The fundamental problem

First, it would like to optimize turn-around time, which is done by running shorter jobs first(SJF); but, the OS doesn't generally know how long a job will run for, exactly the knowledge that algorithms like SJT(or STCF) require.
Second, MLFQ would like to make a system feel responsive to interactive users(i.e., users sitting and staring at the screen, waiting for a process to finish), and thus minimize response time; unfortunately, algorithms like Round Robin reduce response time but are terrible for turn-around time.

Question: How TO SCHEDULE WITHOUT RERFECT KNOWLEDGE?

How can we design a scheduler that both minimizes response time for interactive jobs while also minimizing turn-around time without a priority knowledge of job length?

MLFQ: Basic Rules

Assumption: The MLFQ has a number of distinct queues, each assigned a different priority level.
At any given time, a job that is ready to run is on a single queue.
MLFQ uses priorities to decide which job should run at a given time.

Basic rules for MLFQ:
Rule 1: if Priority(A) > Priority(B), A runs (B donsen't).
Rule 2: if Priority(A) = Priority(B), A&B run in RR.
The key to MLFQ scheduling: How the scheduler sets priorities?
MLFQ varies the priority of a job based on its observed behaviour.

Example

If a job repeatedly relinquishes the CPU while waiting for input from the keyboard, MLFQ its priority high.
if a job uses the CPU intensively for long periods of time, MLFQ will reduce its priority.
MLFQ will try to learn about processes as they run, and thus use the history of the job to predict its future behaviour.

Attempt #1: How to Change Priority

First attempt

Rule 3: When a job enters the system, it is placed at the highest priority(the topmost queue).
Rule 4a: If a job uses up an entire time slice while running, its priority is reduced(i.e., moves down one queue).
Rule 4b: If a job gives up the CPU before the time slice is up, it stays at the same priority level.

Ex 1) Long-running Job Over time

작업은 최고 우선 순위로 진입한다.→ Q2, 10msec timeslice가 하나 지나면 Schduler는 작업의 우선 순위를 한 단계 낮추어 해당 작업을 Q1으로 이동시킨다. 다시 하나의 Time Slice 동안 Q1에서 실행한 후 작업은 마침내 가장 낮은 우선순위를 가지게 되고 Q0으로 이동된다. 이후엔 Q0에 계속 머무른다.

Ex 2) Along Came A Short Job

좀 더 복잡한 예를 살펴보낟. MLFQ가 어떻게 SJF에 근접할 수 있는지 이해해보자.
이 예에서는 2개의 작업이 존재한다. A는 오래 실행되는 CPU 위주 작업, B는 짧은 대화형 작업.
A는 얼마동안 이미 실행온 상태이고, B는 이제막 도착했다고 가정한다.
→ CPU 위주 작업들처럼 A(검은색)는 가장 낮은 우선순위 큐에서 실행되고 있다. B(회색)는 T=100에 시스템에 도착하고 가장 높은 우선순위 큐에 놓여진다. 실행 시간이 짧기 때문에 (단지 2ms), 두 번의 time slice를 소모하면 B는 바닥의 큐에 도착하기 전에 종료한다. A는 낮은 우선 순위에서 실행을 재개한다.

이 예에서 이 알고리즘의 주 목표를 알 수 있다. Scheduler는 작업이 짧은지 긴지 모른다. 일단 짧은 작업이라고 가정하여 높은 우선순위를 부여한다. 진짜 짧으면 빨리 실행하고 바로 종료된다. 길다면 천천히 아래 큐로 이동하게 되고 스스로 긴 배치형 작업이라는 것을 증명한다. 이러한 방식을 MLFQ는 SJF를 근사할 수 있다.

Ex 3) What About I/O?

다음으로 I/O 작업을 수행하는 예를 살펴보자. Rule 4b가 말하는 것처럼, Process가 Time Slice를 소진하기 전에 Processeor를 양도하면 같은 Priority를 유지하게 한다. ← 이 규칙 의도는 간단하다.
대화형 작업이 키보드나 마우스로부터 사용자 입력을 대기하며 자주 입출력을 수행하면 Time Slice가 종료되기 전에 CPU를 양도하게 될 것이다. 그런 경우 동일한 Priority를 유지하게 되는 것이다.

위 예는 이 규칙이 동작하는 예를 보이고 있다. B(회색)는 대화형 작업으로서 I/O를 수행하기 전에 1msec동안만 실행된다. A(검정색)는 긴 배치형 작업으로 B와 CPU를 사용하기 위하여 경쟁한다. B는 CPU를 계속해서 양도하기 때문(대화형 작업이라 계속 CPU를 양도한다.)에 MLFQ방식은 B를 가장 높은 우선순위로 유지한다.

Problems With Our Current MLFQ

현재의 MLFQ는 단순하다. CPU를 긴 작업들과 짧은 작업들 사이에 공유하고, I/O 중심 대화형 작업을 빨리 실행시키기 때문에 잘 동작하는 것처럼 보이지만 심각한 결점을 가진다.

1. Starvation

시스템에 너무 많은 대화형 작업이 존재하면 그들이 모든 CPU 시간을 소모하게 될 것이고 따라서 긴 실행 시간 작업은 CPU 시간을 할당받지 못할 것이다.← Starvation

2. Smart (or wicked) user

Smart user는 Scheduler를 자신에게 유리하게 동작하도록 다시 작성할 수 있다. 일반적으로 Schduler를 속여서 지정된 몫보다 더 많은 시간을 할당하도록 만드는 것을 말한다. 현재까지의 MLFQ는 다음 공격에 취약하다. Time Slice가 끝나기 전에 아무 파일을 대상으로 I/O 요청을 내려 CPU를 양도한다. 제대로 된다면 Time Slice 99%를 실행하고 CPU를 양도하게 되면 CPU를 거의 독점할 수 있다.

3. Transition of behaviors

프로그램은 시간이 지남에 따라 동작이 변할 수 있다. CPU 위주 작업이 대화형 작업으로 바뀔 수 있다. 현재 구현 방식으로는 그런 작업은 다른 대화형 작업들과 같은 대우를 받을 수 없다.

Attempt #2: The Priority Boost

규칙을 보완하여 Starvation 문제를 방지할 수있는지 살펴보자. 간단한 아이디어는 주기적으로 모든 작업의 Priority를 boost(상향 조정)하는 것이다. 일단 간단한 방법: 모두 최상위 큐로 보내는 것이다.

Rule 5: After some time period S, move all the jobs in the system to the topmost queue.

새 규칙은 두가지 문제를 한번에 해결한다.

Processes are guaranteed not to starve: by sitting in the top queue, a job will share the CPU with other high-priority jobs in a RR fashion.
If a CPU-bound job has become interactive, the scheduler treats it properly once it has received the priority boost.

Attempt #3: Better Accounting

해결해야 할 문제가 하나 더 있다. Scheduler를 자신에게 유리하게 동작시키는 것을 어떻게 막을 수 있을까? 이러한 일을 가능하게 만든 주범은 Rule 4a, Rule 4b다.

Rule 4a: If a job uses up an entire time slice while running, its priority is reduced(i.e., moves down one queue).
Rule 4b: If a job gives up the CPU before the time slice is up, it stays at the same priority level.

두 규칙은 작업이 Time Slice가 끝나기 전에 CPU를 양보하여 Priority를 유지가 가능하게 한다.
여기서 해결책은 MLFQ의 각 단계에서 CPU 총 사용 시간을 측정하는 것.
Scheduler는 현재 단계에서 프로세스가 소진한 CPU 사용 시간을 저장한다. 프로세스가 Time Slice에 해당하는 시간을 모두 소진하면 다음 Priority Queue로 강등된다. Time Slice를 한번에 소진하든 짧게 여러번 소진하든 상관 없다. 규칙 4a와 4b를 합쳐 4로 재정의한다.

Rule 4: One job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced.

EX) Workload가 Scheduler를 자신에게 동작시키려고 할 때 예전 규칙 4a, 4b일 때 행동(왼쪽), 새로운 규칙 4(오른쪽)을 아래 그림을 통해 아 수 있다. 이런 자신에게 유리하도록 조작하는데 대한 방지책이 없다면 프로세스는 time slice가 끝나기 직전에 입출력 명령어를 내릴 수 있어 CPU 시간을 독점할 수 있다. 방지책이 마련되면 프로세스의 입출력 행동과 무관하게 아래 단계 큐로 천천히 이동하게 되어 CPU를 자기 몫 이상으로 사용할 수 없게 된다.

MLFQ Sheduling에는 여러 다른 쟁점이 남아있다.

필요한 변수들을 Scheduler가 어떻게 설정해야 하는가.

몇 개의 Queue가 존재하는가?
Queue당 Time slice의 크기는 얼마로 해야하는가?
Starvation을 피하고 행동을 반영하기 위하여 얼마나 자주 Priority가 Boost해야하는가?

이런 쟁점의 질문은 workload에 대해 충분히 경험하고 계속 조정해나가면서 균형점을 찾아야한다.

MLFQ 요약

이 알고리즘은 Multi level queue를 가지고 있으며, 지정된 작업의 priority를 정하기 위하여 feedback를 사용한다. 과거에 보여준 행동이 priority의 지침이 된다. 작업이 시간이 지남에 따라 어떻게 행동하고 그에 맞게 어떻게 처리하는지에 대해 주의를 기울여야한다.

지금까지 배운 MLFQ의 Rule이다.

Rule 1: if Priority(A) > Priority(B), A runs (B donsen't).
Rule 2: if Priority(A) = Priority(B), A&B run in RR.
Rule 3: When a job enters the system, it is placed at the highest priority(the topmost queue).
Rule 4: One job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced.
Rule 5: After some time period S, move all the jobs in the system to the topmost queue.

MLFQ는 작업의 특성에 따라 정보 없이, 작업의 실행을 관찰한 후 그에 따라 Priority를 지정한다.
MLFQ는 Turn-around time, respone time을 모두 최적화한다.
짧게 실행되는 대화형 작업에 대해서는 우수한 전반적인 성능을 제공한다.→ SJF, STCF와 유사
오래 실행되는 CPU-집중 workload에 대해서는 공정하게 실행하고 조금이라도 진행되도록 한다.

영문 요약 내용

MLFQ is interesting for the following reason: instead of demandinga prioriknowledge of the nature of a job, it observes the execution of ajob and prioritizes it accordingly. In this way, it manages to achieve thebest of both worlds: it can deliver excellent overall performance(similarto SJF/STCF) for short-running interactive jobs, and is fair and makesprogress for long-running CPU-intensive workloads. For this reason,many systems, including BSD UNIXderivatives [LM+89, B86], Solaris[M06], and Windows NT and subsequent Windows operating systems[CS97] use a form of MLFQ as their base scheduler.

🐌아이콘 제작자: Smashicons

Smashicons 디자인의 무료 벡터 아이콘

'코딩 관련 > OS' 카테고리의 다른 글

운영체제 7 - Interlude(Memory API) [Memory Virtualization] (0)	2021.06.11
운영체제 5 - Proportional Share [CPU Virtualization] (0)	2021.06.11
운영체제 3 - Limited Direct Execution [CPU Virtualization] (0)	2021.06.11
운영체제 2 - System Call [CPU Virtualization] (0)	2021.06.11
운영체제 1 - Process [CPU Virtualization] (0)	2021.06.11

공지사항

안녕하세요 처음 뵙겠습니다.

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

SimYoHyun

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

언제쯔음

티스토리 뷰