UP | HOME

System Programming

Credits to: CS341 System Programming at UIUC

To pull starter code for assignments:

git pull release main --allow-unrelated-histories --no-rebase

May have to power on the server at https://vc.cs.illinois.edu/.

1. C Language

1.1. Function Parameter List

(void): Called with zero argument only.

(): Called with any number of arguments.

main(): A special function. Standards:

  • int main(void)
  • int main()
  • int main(int argc, char * argv [])

1.2. Preprocessor

Text substitution the compiler performs before actually compiling the program.

#define A B macro: Substitute A with B. Can substitute function together with arguments.

1.5. TODO File Descriptor-Level (System Call) Input/Output

Type man in the shell to access the man page. They are organized in sections. Section 2 are system calls, section 3 are C libraries, and section 7 are longer articles.

write()

unistd.h

3 arguments:

  • file descriptor (int):
    • standard output: 1 or STDOUT_FILENO. If redirecting the output to some file (e.g., ./a.out > output.txt), then the standard output will be in that file.
    • standard error: 2 or STDERR_FILENO. Will be on screen but not the file.
    • See the return value of open()
  • pointer to a character (void *, without specific type)
    • e.g., "Hello\n"
  • the number of characters to write (int)
    • e.g., 6 for entire "Hello\n"
int open(const char *pathname, int flags, mode_t mode);

<fcntl.h>

  • pathname: pointer of path name.
  • flags: bitwise OR of one or more flags
    • O_CREAT: create if the file does not exist
    • O_TRUNC: truncate to length 0 (overwrite). Requires that the access mode allows writing.
    • O_RDWR: request opening the file with read/write access.
  • mode (essentially an integer, <sys/types.h> and <sys/stat.h>): bitwise OR of one or more modes
    • S_IRUSR: user has read permission.
    • S_IWUSR: user has write permission.

Return value: a file descriptor (int). The lowest unused file descriptor. -1 if an error occurred.

close()

1 argument:

  • file descriptor (int).

E.g.,

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>


int main()
{
    mode_t mode = S_IRUSR | S_IWUSR;
    int fd = open("output.txt", O_CREAT | O_TRUNC | O_RDWR, mode);
    write(fd, "Hello\n", 6);
    close(fd);
    return 0;
}

E.g., if close(1) at the very beginning:

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main() {
    close(STDOUT_FILENO);
    mode_t mode = S_IRUSR | S_IWUSR;
    int fd = open("hello_world.txt", O_CREAT | O_TRUNC | O_RDWR, mode);
    printf("Hello\n");
    close(fd);
    return 0;
}

then open() will return 1 as the file descriptor. In this case, those outputs used to go to the standard output (e.g., printf) and those delibrately written to the file will all go to the file.

read()

ssize_t read(int fildes, void *buf, size_t nbyte);: <unistd.h>

  • read nbytes bytes from file descriptor fildes to buffer pointed to by buf.
  • return value: s(igned) size type.
    • if successful: the numbe rof byptes actually read
    • end-of-file: 0
    • error: -1

P.S.: May want to manually put a \0 at the end. E.g.,

char buffer[4096];
ssize_t len = read(0, buffer, sizeof(buffer) - 1);
buffer[len] = '\0';

1.6. C Library’s Input / Output

1.6.1. File stream, FILE

A file stream (or file descriptor in system calls) is the base interface to everything external to RAM, including:

  • files
  • pipes
  • sockets
  • standard streams (C library): stdin, stdout, stderr. They are of FILE* type, and a new FILE* could be created using fopen().

See ferror(), feof(), fgetc(), etc.

1.6.2. printf() family

<stdio.h>

Even if put before system-call IO codes, its output may be still after their outputs

C library has a buffer (for performance), printf() is only going to call write() when

  • that buffer is full
  • finish a line
  • close a file stream

It writes to the standard output. To write to some specific file descriptor, one can use the trick in the second exampe in close().

fprintf(): To print to other file stream

  • stderr is not buffered

asprintf(char **ret, const char *format, ...):

  • Takes in an address of a char pointer, allocate enough memory, and set the pointer (*ret) to be a pointer to that memory.

    char * ptr = NULL;
    asprintf(&ptr, "%d", 123);
    printf("%s\n", ptr);
    free(ptr);
    

    Free the pointer later on.


1.6.3. puts()

puts()
<stdio.> Taskes a char pointer and print it.

1.6.4. TODO scanf() family

int scanf(const char * format, ...);: <stdio.h>

  • format:

    d i u o x f c s p
    int hex … float char string (one word) pointer
  • return value: the number of input items assigned.

sscanf():

2+ arguments:

  • input string
  • format string
  • pointers to memory where the extracted output will be stored. E.g., to store in integer variable x, & x should be put there.

Return value: number of arguments it correctly parsed.

fscanf(): The same except that the first argument is a file descriptor. File descriptor stdin represents the standard input.

scanf(): Read from standard input. No file descriptor as its input.

1.6.5. gets()

gets(): Takes a char pointer and read the content into it. When the read content is longer than the char array, it will overflow to other variables.

E.g.,

char buffer[12];
gets(buffer);

1.6.6. getline()

ssize_t getline(char **lineptr, size_t *n, FILE * stream);:

  • stream: stdin or from fopen().
  • Prepare a C-string (as a char pointer) and a capacity variable. Pass these two by reference (as their addresses) in order to let getline() modify them.
  • getline
    1. reads from the file stream stream (newline at the end included),
    2. allocate memory (in heap) for the content,
    3. modify the C-string pointed to by lineptr to the content,

      lineptr must be initialized to NULL before the first call.

    4. and update the capacity pointed to by n to capacity of the allocated memory.

      n must be initialized to 0 before the first call and retain the correct value (modified by getline()) in order to let getline properly allocate memory.

  • Free the buffer to avoid memory leak.

    P.S., when calling getline() to read longer text, it may allocate a new starting address while freeing the old address. Only need to handle freeing the last address returned.

Return value: number of characters read, excluding \0, in type ssize_t (signed size). -1 if it fails to read a line or encounters end-of-file.

E.g.,

#include <stdio.h>
#include <stdlib.h>

int main() {
    char * buffer = NULL;
    size_t capacity = 0;

    ssize_t num_char = getline(& buffer, & capacity, stdin);
    printf("%d \"%s\" %d\n", num_char, buffer, capacity);
}
abcd abcd abcd
15 "abcd abcd abcd
" 16
abcd abcd abcd abcd
20 "abcd abcd abcd abcd
" 32

To strip off the newline at the end (if any)

if (num_char > 0 && buffer[num_char - 1] == '\n') buffer[num_char - 1] = '\0';

1.7. Detecting error

assert(): <assert.h>. Assert an expression that is supposed to be true. If the expression is actually false, the process will be aborted.

1.8. Sizes, Pointers

1.8.1. Integer

INT_MIN and INT_MAX: <limits.h>, the range of int on a particular machine.

The size of int is dependent on architecture and compiler. However, C promises that an int will be at least 16 bits.

1.8.2. Character

A char is always one byte. However, for C, a byte does not mean 8 bits, but at least 8 bits. (On modern machines, 1 byte typically = 8 bits)

CHAR_BITS: <limits.h>, the number of bits for a char on a particular machine.

1.8.3. sizeof()

1 argument: a type / variable

Return value: the size of the type/variable in bytes.

Evaluated at compile-time (as opposed to link time and run time).

  • Not link time: E.g., sizeof() a struct must be after the full declaration of the struct. Claiming the struct, followed by sizeof() and then its full declaration will not compile.
  • E.g.,

    int a = 0;
    size_t a_size = sizeof(a ++);
    printf("a: %d\n", a);
    

1.8.4. NULL

NULL pointer has a address of 0.

Dereferencing a NULL pointer will cause segmentation fault. To avoid that, check if the pointer is NULL.

1.8.5. int Arrays/Pointers

When declaring an array, the variable is a pointer to the address of the first element of that array.

To declare a pointer to the whole array, do int (*ptr)[<same_length>].

E.g.,

#include <stdio.h>

int main() {
    int a[8];
    printf("a is at %p, a+1 is at %p, a+8 is at %p\n", a, a+1, a+8);
    int *p1 = a;
    printf("p1 is at %p, p1+1 is at %p, p1+8 is at %p\n", p1, p1+1, p1+8);
    int (*p2)[8] = &a;
    printf("p2 is at %p, p2+1 is at %p\n", p2, p2+1);
    return 0;
}
a is at 0x16f5d3308, a+1 is at 0x16f5d330c, a+8 is at 0x16f5d3328
p1 is at 0x16f5d3308, p1+1 is at 0x16f5d330c, p1+8 is at 0x16f5d3328
p2 is at 0x16f5d3308, p2+1 is at 0x16f5d3328

To read/write the address of a pointer, add an asterisk (*(a)). For int array, we can alternatively use square bracket (a[0]). Additionally, we can do 0[a].

E.g.,

*(a) = 100;
*(a+1) = 101;
printf("a[0] is %d, a[1] is %d\n", a[0], a[1]);
a[0] is 100, a[1] is 101

1.8.6. char Arrays/Pointers

C string has a special character \0 at the end.

sizeof() a string returns the length of the string +1, because each character is one byte. It also counts \0 inside the string.

E.g.,

printf("sizeof(\"Hello\\0World\") = %d\n", sizeof("Hello\0World"));
sizeof("Hello\0World") = 12

A character pointer char *ptr = <string> is a character array, and ptr will point to the first character. If deferencing the last character (\0), we will get a 0. sizeof(ptr) is the size of the pointer itself.

E.g.,

char *ptr = "hello";
printf("ptr[4] = '%c', ptr[5] = '%c' or %d\n", ptr[4], ptr[5], ptr[5]);
ptr[4] = 'o', ptr[5] = '' or 0

Alternatively, we can declare it in the syntax for array, char arr[] = <string>. In this case, sizeof(arr) is the size of the whole array (including the \0 at the end).

E.g.,

char arr[] = "hello";
printf("%d\n", sizeof(arr));
6

strlen(): <string.h>. Pass a character array to get its length (until but does not include the first \0).

E.g.,

char *ptr = "Hello\0World";
printf("strlen(ptr) = %d, strlen(\"Hello\\0World\") = %d\n", strlen(ptr), strlen("Hello\0World"));
strlen(ptr) = 5, strlen("Hello\0World") = 5

If a character pointer (char * ptr) is initialized with a constant string, then modifying individual characters of it will cause segmentation fault. This is because the memory will be read-only.

E.g.,

char *ptr = "hello";
ptr[0] = 'H';

will give an error.

However, if a character array (char arr []) is initialized with a string, then the memory is in the stack and we can modify the array.

E.g.,

char arr[] = "hello";
*arr = 'H';
printf("%s\n", arr);
Hello

strcpy()

<string.h>

Two arguments:

  • destination: The character pointer that points to the memory to which the string will be copied.
  • string.

E.g.,

char ptr[] = "hello";
printf("%s\n", ptr);
strcpy(ptr, "world");
printf("%s\n", ptr);
hello
world

strdup(): Duplicate the string pointed to by the input pointer and return a pointer to the duplicate. Free the pointer to avoid memory leak.


Pointer to pointer to char / 2D char array: Use char **ptr or char * ptr [].

E.g.,

char * p1 [] = {"ab", "xyz"};
char ** p2 = p1;
printf("p1[0]=%s, p1[1]=%s\n", p1[0], p1[1]);
printf("p2[0]=%s, p2[1]=%s\n", p2[0], p2[1]);
p1[0]=ab, p1[1]=xyz
p2[0]=ab, p2[1]=xyz

1.9. Function Pointer

A pointer that stores the address of a function’s code.

With this, we can pass a function as an argument

This allows us to reuse code

Function pointer
Pointers to functions. Allow you to pass functions as argumetns to other functions.
  • Declaration:

    returnType (*pointerName) (paramTypes, ...);
    
  • Initilization: returnType (*pointerName) (paramTypes, ...) = &funcName or = funcname.
  • Invocation: dereference the pointer (*pointerName)(arguments) or simply pointerName(arguments)
  • Passing it as arguments:

    anotherType anotherFunc(returnType (*pointerName)(paramTypes, ...));
    
  • Typedef:

    typedef returnType (*Type)(paramTypes, ...);
    Type pointerName;
    

    E.g.,

    #include <stdio.h>
    
    typedef int (*funcType)(int, int);
    
    int add(int a, int b) {
        return a + b;
    }
    
    int subtract(int a, int b) {
        return a - b;
    }
    
    int main() {
        funcType operation1 = add;
        funcType operation2 = subtract;
        printf("%d\n", operation1(1, 2));
        printf("%d\n", operation2(1, 2));
        return 0;
    }
    

1.10. Structs

1.10.1. TODO typedef

typedef: Make an alias of a type.

E.g.,

typedef long long ll;
ll n;

Function type: todo

1.10.2. struct

struct: struct <name> to define a structure called <name>. struct <name> var~ to declare a variable of that structure. <var>.<attr> to access an attribute. If ptr is a pointer to the variable, then we can use ptr-><attr> to access an attribute.

1.11. time, ctime

time()

<time.h>

  • pointer to where the returned time will be stored, or a null pointer.

Return value: number of seconds since 1970, in type time_t. Use & to get its address.

ctime()

Converts the given time value in time_t to a human readable format. Note that it stores the result in a static storage and will overwrite the storage if called again later on.

  • pointer to the time_t time.

Return value: A character pointer to the output string.

1.12. TODO Memory Use

char *t1 = "abcd";: The “abcd” is in the text segment, which is read-only, and t1 points to the address of it.

char t2[] = "abcd";: The “abcd” is first stored in the text segment. Then, C creates enough space for the array on the stack and copies the “abcd” into the array.

*t2 = 'A';: modify an element of the array t2 on the stack.

Stack value is unclean.

Global memory: defiend globally

1.13. main(), arguments, return value

main():

  • int argc: the number of arguments
  • chat * argv[]: an array of character pointers. The first ([0]) character pointer points to the name of the command (e.g., ./a.out). Words are separated by whitespaces. To join several words, use double quotes to wrap them.

E.g.,

#include <stdio.h>

int main(int argc, char* argv[]) {
    printf("argc is %d\n", argc);
    for (int i = 0; i < argc; ++ i) {
        printf("argv[%d]=%s\n", i, argv[i]);
    }
    return 0;
}
% ./a.out abc "xyz 123"
argc is 3
argv[0]=./a.out
argv[1]=abc
argv[2]=xyz 123

Alternatively, we can iterate through argv and count until a null pointer is encountered (there is a nulll pointer at the end).

Return value of main():

  • Standard: 0
  • echo $? to print the execute value of the last command (e.g., ./a.out)

atoi(): <stdlib.h>. Converts a string to a integer. If the string does not represent an integer or the integer is out of the range of int, it will return 0.

1.14. Environment Variables, environ, getenv()

extern char **environ;

(global) To print all the environment variables, iterate through environ (stop criterion is doable because the last element is null)

extern char ** environ;
int main() {
    char ** ptr = environ;
    while (*ptr) {
        printf("%s\n", *ptr);
        ptr ++;
    }
    return 0;
}

E.g., HOME and USER can be found in the output.

In shell
  • Can add a new variable by export <NAME>=<value>.
  • Print a variable: echo $<NAME> or echo ${<NAME>}.
getenv()
<stdlib.h>
  • Pass the variable name to get the value. If the variable does not exist, will return a null pointer.
  • also see setenv() and putenv().

E.g.,

char * home = getenv("HOME");

1.15. Valgrind

To detect memory leak:

valgrind --leak-check=full --show-leak-kinds=all myprogram arg1 arg2

2. Processes

A process is the base computation container on Linux.

2.1. Forking

pid_t fork(void)

<unistd.h>.

  • Clone the current process to create a child process (Very expensive).
  • Return value: type pid_t. returns a value of 0 to the child process and returns the process ID of the child process to the parent process.

Not shared data (Duplicated, independent):

  • Heap memory: Duplicated for each process.
  • Non-read-only data segment: duplicated for each process.
  • File descriptor tables (mapping from numbers to file): Duplicated for each process, meaning that closing a file descriptor does not affect the file descriptor referring to the same object in the other process.
  • Signal handlers: Signal handlers installed by the parent process are inherited by the child process.

Shared data between parent and child process:

  • Text segment: Executable Code.
  • Read-only data segment: Read-only data, such as constants and static variables, is shared between the parent and child processes
  • Open file descriptors: Meaning that these descriptors reference the same underlying objects. (Otherwise how can they be not shared? XD)
  • Output buffer:
    • Issues: If not flushed manually, these problems may arise: (1) The output from the parent process is mixed with that of the child process. (2) If the child process exec another program and then exit, the buffer will be deallocated, causing the output from the parent to be lost.
    • Resolution:

      fflush(some_file); // e.g., stdout
      pid_t pid = fork();
      
getpid()
<unistd.h> the process ID of the calling process.
getppid()
the parent’s pid.

What happends to child processes if their parents die first: They will be re-parented to pid 1.

2.1.1. ulimit

ulimit
To prevent fork bomb, we can set a limit for how many times you can fork via ulimit.
  • It is terminal session specific. To configure the setting to be applied everytime we start a Bash shell session, add

    ulimit -n 400
    

    into ./bashrc.

  • The recommended threshold is 400.
  • To apply the changes to your current shell session, you can either start a new shell session or run source ~/.bashrc.
  • ulimit -a and check “open files” for the setting.

2.2. Waiting

pid_t waitpid(pid_t pid, int *status, int options);
<sys/wait.h> Wait for process pid with option (of action) options and store the status in status.
  • pid: Which process the caller wants to wait for. If pid is -1, waitpid waits for any child process (see wait()).
  • status: A pointer to an integer where the exit status or termination information of the child process will be stored.
  • options: Controlls the behavior of waitpid.
    • 0 (default), the caller will wait (block) until the child (1) terminates or (2) is stopped by a signal (3) is resumed by a signal.
    • WNOHANG, the caller will not block.
      • If pid has not been stopped/exited chlildren (still running), return 0.
      • If pid has changed state (terminated, stopped, etc.), waitpid will return its process ID immediately.
    • The exit status is stored in *status.
  • Return value (in general):
    • If a child process is stopped/terminated, returns its pid
    • If on error, return -1.
  • Exit status:
    • WEXITSTATUS(status) to get the exit status
pid_t wait(int *wstatus);

Wait one of the children. Equivalent to waitpid(-1, &wstatus, 0);.

E.g., to wait for all children:

while ((wpid = wait(&wstatus)) > 0);
Zombie Process
A zombie process is a process that has terminated but still has an entry in the process table because its parent process has not yet collected its termination status via a wait system call. Zombie processes consume system resources (such as a process table slot and some memory for the process control block), but they do not consume CPU time.
Orphan Process
An orphan process is a process whose parent process has terminated or ended before it has finished. The orphaned child process will become a child of the init process, which has process ID 1.

2.3. exec()

exec series
Replace any lines of code after exec() with the program executed by exec(). The exec process will replace the existing place.
execl(const char *path, const char *arg0, ..., /*, (char *)0, */))
“l” stands for “list”.
  • arg0, arg1, …: The zeroth argument points to the process name, which is by convention the file name, but it can be actually an arbitrary name.
  • This is a list of points to arguments. After that, we must pass a NULL pointer (char *)0.
execlp(const char *file, const char *arg0, ..., /*, (char *)0, */);
“p” stands for “PATH”. Use the “PATH” environment variable as the search path for file.
execv(const char *path, char *const argv[]);
“v” stands for “vector” (array). Arguments are passed as an array of pointers.
  • Return value: It returns only if an error has occurred and the return value is -1.

2.4. fork-exec-wait

int main() {
    pid_t pid = fork();
    if (pid < 0) { // fork failure
        perror("fork");
        exit(1);
    }
    else if (pid > 0) { // in parent
        int status;
        int waitResult = waitpid(pid, &status, 0);
        // If on error waiting
        if (waitResult == -1) {
            perror("waitpid");
            exit(1);
        }
        // if existed but the exist status is non-zero
        if (WIFEXITED(status) && WEXITSTATUS(status) != 0) {
          exit(1);
        }
    }
    else { // in child
        /* obtain args */
        if (execvp(args[0], args) == -1) {
            perror("execvp");
            exit(1);
        }
    }
}

3. Signals

POSIX signals are software interrupts.

Two sources of signals:

  • Kernel
  • User processes

Signal names: of type int

  • SIGINT: Ctrl C. Interrupt program.
  • SIGKILL: kill program.
  • SIGSEGV: segmentation violation.
  • SIGALRM: terminate program. commonly used in scenarios where you need to enforce timeouts. See alarm() below.
  • SIGSTOP and SIGCONT: suspend the execution of a process without terminating it. Once a process is stopped, it retains its state and resource allocation. The process remains stopped until it receives a SIGCONT signal.
unsigned alarm(unsigned seconds);
<unistd.h>. After seconds seconds, send a SIGALRM signal to the current process.

3.1. Terminology

Generated
Signals are generated by the operating system or by specific actions within a program. For example, when a user presses Ctrl+C to interrupt a program, the operating system generates a SIGINT signal and sends it to the process running in the terminal.
Pending
If a signal is generated for a process but that process is currently handling another signal, the new signal is said to be pending until the process finishes handling the current signal.
Blocked
Processes can block signals, meaning they won’t be delivered or handled until the process unblocks them. This is useful for critical sections of code where you don’t want signals to interrupt execution.
Delivered
When a signal is sent to a process and it’s not blocked, the signal is delivered to the process and can be handled by a signal handler.
Caught
When a signal is delivered to a process and the process has registered a signal handler for that signal, the signal is considered caught. The signal handler is a function that gets executed in response to the signal.
Disposition
The disposition of a signal refers to what action the operating system takes when a particular signal is caught by a process. The disposition can be set to default (the default action for the signal), ignore (the signal is ignored), or a specific signal handler function.

Signals in child processes/threads:

  • Signals can be inherited by child processes in Unix-like systems. When a parent process creates a child process, the child inherits the signal disposition of the parent.
  • Threads within a process share the same signal handlers, meaning if one thread catches a signal, it affects the entire process.

3.2. Sending Signals:

Signals can be generated by users, system events, or other processes, enabling the initiation of actions or responses. Examples include user-triggered signals (e.g., pressing CTRL-C) and system-generated signals in response to events like memory access violations. Preferably, signals should not be relied upon as the primary driver of program logic due to inherent drawbacks and limitations.

kill (man -S1 kill)
Send signal in shell.
kill(pid_t pid, int sig) (man -S2 kill)
<signal.h> Send the signal sig to pid. The signal could be one of the signal names above. The kill() function is not only capable of sending SIGKILL.
raise(int sig);
Sending sig to the process itself. Same as calling kill() with getpid().

3.3. Pending Signals

int sigpending(sigset_t *set);
The sigpending function returns a mask of the signals pending for delivery to the calling process in the location indicated by set.

3.4. Handling Signals:

Signal handlers execute in response to signals, but there are strict limitations on the executable code within them. Most library and system calls are considered async-signal-unsafe, necessitating caution and adherence to reentrant safety principles. Ensuring signal handler safety involves meticulous consideration of shared resources, multithreading, and synchronization.

void (* signal(int sig, void (*func)(int)))(int);
Pass the funtion name or the address of the function.
  • sig: Specifies the signal for which the handler is to be set.
  • func: Pointer to the function that will handle the signal. It must have the following signature void func(int sig).
  • It calls the function func when the specified signal sig is received.
int sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact);

allows programs to change the action taken by a process upon receipt of a specific signal. It provides a more flexible and reliable way to handle signals compared to the older signal function.

act: A pointer to a struct sigaction object that specifies the new action to be taken for the signal.

oldact: (Optional) A pointer to a struct sigaction object where the previous action for the signal will be stored.

  • void (*sa_handler)(int): A pointer to the signal handler function. Alternatively, sa_sigaction can be used for more complex handlers.
  • sigset_t sa_mask: A set of signals to be blocked (masked) while the signal handler is executing.
  • int sa_flags: Flags that modify the behavior of signal handling, such as SA_RESTART, which specifies whether interrupted system calls should be automatically restarted.

3.5. Blocking Signals:

Sigprocmask enables blocking, unblocking, or setting signal masks, affecting the delivery of signals to a process or thread. Proper initialization and handling of signal sets are crucial to prevent errors and ensure desired signal blocking behavior.

Signal mask
a set of signals that are currently blocked for the process.
Signal set
Create a signal set by sigset_t set.
  • int sigemptysetsigaddset(sigset_t *set, int signo);: initializes a signal set to be empty.
  • int sigaddset(sigset_t *set, int signo);: Adds the specified signal signo to the signal set.
  • int sigfillset(sigset_t *set);: initializes a signal set to contain all signals.
int sigprocmask(int how, const sigset_t *restrict set, sigset_t *restrict oset);

allows a process to manipulate its signal mask.

how:

  • SIG_BLOCK: The new mask is the union of the current mask and the specified set.
  • SIG_UNBLOCK: The new mask is the intersection of the current mask and the complement of the specified set.
  • SIG_SETMASK: The current mask is replaced by the specified set.

oset: If oset is not null, it is set to the previous value of the signal mask. Otherwise, do nothing.

Usage:

sigset_t set;
sigfillset(&set);
sigprocmask(SIG_SETMASK, &set, NULL); // Block all the signals which can be blocked

3.6. Signals in Child Processes and Threads:

Child processes inherit signal dispositions and masks from their parent processes, affecting their handling of signals. Threads within a process have their own signal masks, but the process itself is treated as a collection of threads by the kernel. Blocking signals in multi-threaded programs requires setting signal masks at the thread level to control signal delivery.

3.7. Topics:

  • Signal Handler Safety
  • Signal Disposition
  • Signal States
  • Pending Signals when Forking/Exec
  • Signal Disposition when Forking/Exec
  • Raising Signals in C
  • Raising Signals in a multithreaded program

4. Memory Allocators

4.1. Memory Layout

Text segment:

  • Readable, executable. Not writable.
  • Bottom. Next to data.

Data segment:

  • Next to text/heap.

Heap:

  • Readable, writable. Not executable.
  • Next to data.

Stack:

  • Readable, writable. Not executable.
  • Goes down

Kernel:

  • Top

static: Add static before variable declaration to make the lifetime of the variable the length of the process. static variables are stored in data segment (also called static memory). Whether putting it globally or inside a function does not affect its lifetime, but only affects the scope of the variable name.

4.2. Memory Allocation API

Heap memory can vary in size.

void * sbrk(int n)

Increases the process’s data segment by n bytes. Returns a pointer to the base of the new storage (top of the increased heap) if successful; otherwise -1.

Use sbrk(0) to get the top of the heap.

The internal implementation of malloc() etc. may call sbrk() when additional heap memory is required.

malloc()
<stdlib.h>. Allocate memory of the given size (in bytes) in the heap without initializing it. Return the pointer to the first byte. If it fails to allocate the memory, it will return a null pointer.
free()

<stdlib.h>. Free the allocated memory pointed to by the given pointer. Set the freed pointer to NULL to avoid dangling pointer a (pointer to invalid memory). free(NULL) is a null operation.

E.g., to return a pointer pointing to some memory allocated within a function

char * func() {
    char * ptr = malloc(128);
    if (!ptr) return ptr;
    strcpy(ptr, "hello");
    return ptr;
}

int main() {
    char * ptr = func();
    printf("%s\n", ptr);
    free(ptr);
    return 0;
}
hello

Double free: Once the pointer is freed, the memory it points to may be allocated for other use (e.g., used by the heap itself). Thus, double free will cause unexpected behaviour.

void * memset(void *b, int c, size_t len);
Write len bytes of value c (converted into a char) to the string b.
void * realloc(void *ptr, size_t size);
Tries to change the size of the allocation pointed to by ptr to size, and returns the pointer to the memory (NULL if there is an error).
  • If there is enough space starting from ptr, simply expand the allocation and do not copy old data.
  • If there is not, find a new location, copy the old data to there, and free the old allocation.
void * calloc(size_t count, size_t size);
Allocates memory for count elements that are size bytes each, initializes the memory to zero, and returns a pointer to that memory.

4.3. Placement Strategies

Elements in a struct is stored continuously (increasing for both stack and heap).

Placement Strategies
  • Best fit: Smallest location that is sufficienty large to hold the memory.
    • Drawbacks: (1) Need to look through the entire memory space. (2) When realloc later, less likely to be able to expand it, has to realloc elsewhere and copy the data.
    • Creates a lot of small fragments.
  • Worst fit: The largest location.
    • Drawbacks: Need to check the entire heap.
    • Creates a lot of medium fragments.
  • First fit: The first location that is big enough.
    • Drawback: Fragamentation.
False Fragmentation

free memory blocks are available for allocation but are not contiguous, leading to wasted memory space that is not effectively utilized.

There are two types of false fragmentation. Be careful to differentiate available memory, allocated memory, and required memory:

  • Internal fragmentation occurs when allocated memory is larger than required, resulting in wasted space within allocated memory blocks. This wasted space is internal to the allocated block.
  • External fragmentation occurs when there is enough total memory space to satisfy a memory request, but the available space is fragmented into small, non-contiguous blocks, making it impossible to allocate a contiguous block of memory to satisfy the request.

5. Threads

Thread

A thread (short for a thread of execution) is the smallest unit of execution within a process. It represents an independent sequence of instructions that can be scheduled to run by the operating system. Threads share the same memory space within a process and can communicate and synchronize with each other more efficiently than separate processes.

Memory sharing of threads within the same process:

  • Independent (Duplicated): registers, stack pointer, and stack segment.
  • Shared: heap segment, data segment, address space, file descriptors. This means they can access the same variables and data structures of the process.

A thread is much lighter to create compared to a process.

Re-entrant code: If

5.1. Pthreads

pthread is short for POSIX-thread.

<pthreads.h>.

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg);
Usually don’t need to worry about attr.
  • Will write the thread id to thread
  • The thread executes start_routine with argument arg. Functione pointer start_routine takes in a void * and return a void *.

    Return value: exit value.

int pthread_join(pthread_t thread, void **retval);

suspends execution of the calling thread until thread terminates.

  • Store the return value (void *) to *retval.

If the thread is already terminated, the return value will be 0.

#include <pthread.h>

void* do_massive_work(void* payload){
    /* Doing massive work */
    return NULL;
}

int main(){
    pthread_t threads[10];
    for(int i = 0; i < 10; ++i){
        pthread_create(threads+i, NULL, do_massive_work, NULL);
    }

    for(int i = 0; i < 10; ++i){
        pthread_join(threads[i], NULL);
    }
    return 0;
}
pthread_exit(...)
same as return ... in a start_routine function.
pthread_cancel()

Tell the thread to stop, but there is no guarantee that how fast it will stop.

Better practice: Use a global flag variable, e.g., pls_stop, to set whether to stop or not. Then, occasionally check in the thread the value of this variable.

pthread_self()
Return the pthread id of the caller.

5.2. Exit from Threads

  • exit() / return ... from main() : terminates the process with all threads.
  • pthread_exit() / return ... from the starting function: terminates the thread. If that thread is the last thread, the process will finish.

    A lasy way to wait for all thread to finish: E.g.,

    int main() {
        pthread_t tid1, tid2;
        pthread_create(&tid1, ...);
        pthread_create(&tid2, ...);
    
        pthread_exit(NULL); // exit this thread
        return 0; // never happen, the whole process wait until the last pthread ends
    }
    

5.3. Race Condition

Race conditions are whenever the outcome of a program is determined by its sequence of events determined by the processor. This means that the execution of the code is non-deterministic.

5.4. Reduce

In functional programming, there is an operation called reduce. reduce takes three parameters: an input list, a function to apply to the elements of that list, and an initial value (or base case). The function takes two inputs and returns a value. It is first applied to the base case and the first element, and from then on, the function is repeatedly applied to the cumulative result and the next element. reduce then returns the “reduced” value of the entire list.

int reduce(int *list, size_t length, reducer reduce_func,
           int base_case) {

    int result = base_case;

    for (size_t i = 0; i < length; ++i) {
            result = reduce_func(result, list[i]);
    }

    return result;
}

Given that the reduce function and the reduce operation are commutative and associative, we can accelerate this procedure using parallelism

6. Thread-Based Parallelism

Recall the meomory sharing policies of threads and processes: Threads global variables (part of data segment) and heap segment, while processes do not. Therefore, we have to carefully use synchronization mechanisms to manage threads within the context of shared memory. On the other hand, making good use of threads can accelerate computation.

Critical section

Part of code that should be executed by one thread at a time. This usually involves updating shared data

E.g., count ++. This code actually first reads the count value, and then adds it, and then writes it back. When two threads run this code at the same time, the actual execution order might be that both of them read the value before the other write it back, so it may ends up adding count only once.

Atomic Operation
An atomic operation is an operation that is performed as a single, indivisible unit of execution, without interruption from other threads or processes.
Thread safe
A piece of code is thread safe if it can be used with multi-thread.

6.1. Mutex (Mutual Exclusion)

#include <pthread.h>

pthread_mutex_init(&lock, NULL);

where lock is a pthread_mutex_t.

Alternatively, static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; (global)

pthread_mutex_lock(&lock), pthread_mutex_unlock(&lock)
In between, do critical section stuff. Only one (but arbitrary) thread will do the critical section and others will be blocked.
pthread_mutex_destroy()
destroy.

How to make it thread safe:

  • Identify critical sections.
  • Wrap them pthread mutex lock/unlock.
#include <pthread.h>

pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;

int x;

int main() {
    pthread_mutex_lock(&m);
    x = x + 1;
    pthread_mutex_unlock(&m);

    pthread_mutex_destroy(&m);
    return 0;
}

6.2. Lock Contention

Lock Contention : One thread is forced to wait for another

6.3. Dead lock

#include <pthread.h>

pthread_t tid1, tid2;
pthread_mutex_t m = PTHREAD_MUTEX_INITiALIZER;

void * func(void * param) {
    for (int i = 0; i < 100; ++ i) {
        pthread_mutex_lock(&m); // lock itself and the other thread
    }
    return NULL;
}

int main() {
    pthread_create(&tid1, 0, func, NULL);
    pthread_create(&tid2, 0, func, NULL);
    pthread_join(tid1, NULL);
    pthread_join(tid2, NULL);
    return 0;
}

6.4. Semaphore

A semaphore is a non-negative integer variable. It maintains a count that represents the number of available resources or permits.

<semaphore.h>

int sem_init(sem_t *sem, int pshared, unsigned int value);
Initialize
sem_wait(sem_t *sem)
Decrements the value of the semaphore by 1. If the value becomes negative, the calling thread blocks until the semaphore value becomes positive.
sem_post(sem_t *sem)
Increments the value of the semaphore by 1. If there are threads blocked on the semaphore due to a wait operation, one of them is woken up.
sem_destroy(sem_t *sem)
destroy
#include <semaphore.h>

sem_t s;
int main() {
    sem_init(&s, 0, 10);
    sem_wait(&s); // Could do this 10 times without blocking
    sem_post(&s); // Announce that we've finished (and one more resource item is available; increment count)
    sem_destroy(&s); // release resources of the semaphore
}

6.5. Condition Variables

Condition variables allow threads to block until a specific condition is met, at which point they can be woken up by other threads.

pthread_cond_t cv = PTHREAD_COND_INITIALIZER
to initialize a cv.
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);

Unlock the mutex, then block the current thread until another thread signals the condition variable cond using pthread_cond_signal or pthread_cond_broadcast. After the thread is woken up, lock the mutex.

cond is accosicated with mutex and the corresponding shared data. mutex protects the shared data. Before calling pthread_cond_wait, the calling thread must acquire mutex (pthread_mutex_lock).

Atomicity: It release the mutex lock while waiting on condition variable allowing other threads to access the shared data concurrently.

Spurious Wakeups: Threads waiting on a condition variable can sometimes wake up without the condition being signaled. This is known as a spurious wakeup and can occur due to system interrupts or other reasons (not because of the code).

pthread_cond_signal(&cv)
Signals the condition to wake up a (at least one) thread. If multiple threads are waiting, which thread gets woken up is not specified.
pthread_cond_broadast(&cv)

Wake up all threads that are waiting on the condition variable.

pthread_cond_signal and pthread_cond_broadcast can be called by a thread wether or not it owns the mutex associated with the condition variable.

Usage pattern

Wrap pthread_cond_wai by while loop with a manually defined condition. The pthread_cond_signal is called somewhere this condition is satisfied.

pthread_mutex_lock(&mutex);  // Lock the mutex

while (!condition) {
    pthread_cond_wait(&cv, &mutex);  // Wait on the condition variable
}

// Do something after the condition is signaled

pthread_mutex_unlock(&mutex);  // Unlock the mutex
// immediately after the condition is met
pthread_cond_signal(&cv);

There are two sets of control:

  • Manually maintaing a condition: For example, maintain a global variable that monitors the condition. while makes sure that if a thread spuriously wakes up, it re-enter wait if this manual condition check fails.
  • Wait & Signal: Pthread’s utilities. They maintain the pthread condition variable based on their positions in the code.

6.6. Barrier

Barrier

A barrier ensures that all participating threads reach a certain point (i.e., the barrier) in their execution before any of them can proceed further.

This can be achieved by:

  • Waiting: When a thread reaches a barrier, it will wait at the barrier until all the threads reach the barrier.
  • Release: Once all threads have arrived at the barrier, the barrier is released, allowing all threads to proceed with their execution.
pthread_barrier_t
pthread_barrier_init()
pthread_barrier_wait()
pthread_barrier_init()
pthread_barrier_destroy()

6.7. Thread Pool

Problem description: A sequence of a same type of tasks, meaning that each task can be solved by a call to the same function. How to accelerate solving all the tasks?

Thread pool
Create a pool of fixed number of threads at the very beginning. Tasks are solved using these threads instead of new threads.

There are different ways to use the thread pool:

  • Parallelize the task sequence: Each task is solved by one thread and multiple tasks can be solved simultaneously.

    We can use a thread-safe queue (whether or not it has finite capacity) to maintain the reading and solving of tasks: Everytime a task is read,

    1. Push it to the queue (if there is a capacity, we have to implement the waiting scheme).
    2. “Assign” the task to a thread: This is actually done by each thread actively keeping pulling a task from the queue while the queue is non-empty and the reading has not finished.

      In practice, this can be implemented by a condition variable. The conditions are just the two above. Inside the thread, it waits until the condition is met:

      while (1) {
              pthread_mutex_lock(&m1);
              while (tasksCompleted == 0 && queueCount == 0) {
                  pthread_cond_wait(&cv, &m1);  // Wait on the condition variable
              }
              // printf("[%d] After cond var: tasksCompleted: %d, queueCount: %ld\n", tid, tasksCompleted, queueCount);
              if (tasksCompleted == 1 && queueCount == 0) {
                  pthread_mutex_unlock(&m1);
                  return NULL;
              }
              task_t *task = (task_t *)queue_pull(tasks);
              queueCount -= 1;
              pthread_mutex_unlock(&m1);
      
              solve(task);
      }
      

      Outside the thread, send the release signal when one of the condition changes:

      // Condition 1
      while (read task != EOF) {
          pthread_mutex_lock(&m1);
          queue_push(tasks, task);
          queueCount += 1;
          pthread_cond_broadcast(&cv);
          pthread_mutex_unlock(&m1);
      }
      
      // Condition 2
      pthread_mutex_lock(&m1);
      tasksCompleted = 1;
      pthread_cond_broadcast(&cv);
      pthread_mutex_unlock(&m1);
      
  • Parallelize each task: Tasks are solved one-by-one, each divided into subtasks that are solved using all the threads simultaneously.

    We can use a barrier to block all the threads until the all the subtasks of the current task are solved. Everytime a task is read,

    1. Reset all relevant variables.
    2. Divide the task into subtasks.
    3. “Assign” the subtasks to threads:

      1. Similarly, each thread actively tries to grab a subtask while there is a subtask remaining and the reading has not finished.
      2. After the thread solved a subtask, set a barrier.
      3. If this is the last subtask, signal the main thread that this task is completed (See below).

      This can be implemented using a condition variable:

      while (1) {
          pthread_mutex_lock(&m1);
          while (tasksCompleted == 0 && remainingSubtasks != glo_thread_count) {
              pthread_cond_wait(&cv1, &m1);
          }
      
          if (tasksCompleted == 1 && remainingSubtasks == 0) {
              pthread_mutex_unlock(&m1);
              return NULL;
          }
          pthread_mutex_unlock(&m1);
      
          solve the task;
      
          pthread_barrier_wait(&b);
      
          pthread_mutex_lock(&m1);
          remainingSubtasks -= 1;
          if (remainingSubtasks == 0)
              pthread_cond_signal(&cv2);
          pthread_mutex_unlock(&m1);
      }
      
    4. In the main thread, after the subtasks are sent, set a barrier, wait until all subtasks are solved, and then collect the solutions.

      while (read task != EOF) {
          // Reset relevant variables
          pthread_mutex_lock(&m1);
          remainingSubtasks = thread_count;
          pthread_cond_broadcast(&cv1);
          pthread_mutex_unlock(&m1);
      
          pthread_barrier_wait(&b);
      
          pthread_mutex_lock(&m1);
          while (remainingSubtasks != 0) {
              pthread_cond_wait(&cv2, &m1);
          }
          pthread_mutex_unlock(&m1);
      
          collect the solutions
      }
      
      pthread_mutex_lock(&m1);
      tasksCompleted = 1;
      pthread_cond_broadcast(&cv1);
      pthread_mutex_unlock(&m1);
      

7. TODO Deadlock

7.1. Resource Allocation Graph

Deadlock
defined as when a system (a set of rules by which a set of processes can move move from one state to another) cannot make any forward progress (at least one progress can change state).
Resource Allocation Graph
Bipartite graph. Processes and resources are nodes. If a process is using (holding) a resource, an arrow is drawn from the resource node to the process node. If a process is requesting (waiting) a resource, an arrow is drawn from the process node to the resource node.
Coffman conditions

Including:

  • Mutual exclusion: A resource can only be obtained by one process at a time.
  • Hold and wait: Once a resource is obtained by a process, the process keeps the resource locked, until the process itself unlocks it.
  • No pre-emption: Nothing can force the process to give up a resource, unless the process itself unlocks it.
  • Circular wait: There exists a cycle in the Resource Allocation Graph, or there exists a set of processes {P1, P2,…} such that P1 is waiting for resources held by P2, which is waiting for P3,…, which is waiting for P1.

Theorem: A system is deadlocked if and only if all the Coffman conditions holds.

Remark: The first 3 conditions are more like the (natural) rules of the system, while the 4th is the triggering condition for deadlock.

8. Virtual Memory

Virtual memory
Virtual memory provides a layer of abstraction over physical memory, allowing programs to operate as if they have access to a large, contiguous memory space.
  • Protection: Virtual memory systems provide memory protection mechanisms that prevent one program from accessing the memory of another program. This enhances system stability and security by isolating processes from each other.

    Each process is isolated and has its own virtual memory space, in which addresses look continuous to the process. However, different processes may share physical memory (e.g., parent and children processes).

  • Efficient Use of Memory: Virtual memory allows for efficient use of physical memory by swapping less frequently used pages to secondary storage. This enables systems to run larger programs than would be possible with only physical memory.
  • Ease of Programming: Programmers can develop applications without needing to worry about the physical memory limitations of the underlying hardware. They can write code assuming a large and contiguous memory space, which simplifies application development.

8.1. Translating Address

Memory Management Unit (MMU)

A hardware unit that is part of the CPU. It converts a virtual memory address into a physical address.

A \(X\)-bit machine, meaning pointers are \(X\)-bit, can address \(2 ^{X}\) different locations where one location is one byte.

Naive lookup scheme: A \(2 ^{X}\)-entry (# of different physical addresses) table, where each entry records a \(X\)-bit physical address (which is \(X /8\) bytes). This consumes \(2 ^{X} \cdot X /8\) bytes.

The above lookup scheme is to inefficient. Instead, we can all addresses into pages and build a look up table for each page.

Page
A block of (continuous) virtual memory. A typical block size (# of addresses in a page)on Linux is \(2 ^{12}\) addresses. For a \(X\)-bit machine, assuming a block size of \(2 ^{P}\), there are \(2 ^{X - P}\) pages.
Frame
A block of (continous) physical memory. A frame has the same number of addresses as a virtual page.
Page Table

The highest \((X-P)\) bits are the page number and the lowest \(P\) bits are the offset within the frame.

address_split.png
Figure 1: From http://cs341.cs.illinois.edu/coursebook/Ipc

A page table is a map from a page number (physical) to a frame number (virtual). The table is in the order of page number.

frame_table.png
Figure 2: From http://cs341.cs.illinois.edu/coursebook/Ipc

In practice, the table only contains the second column. The page table itself consumes \(2 ^{X - P} \cdot (X - P) / 8\) bytes (usually rounded up \((X - P) / 8\) to some power of 2).

Memory lookup: First lookup the correspdoning frame number to the page number in the page table. And then append the offset to the frame number to get the physical address.

page_table_lookup.png
Figure 3: From http://cs341.cs.illinois.edu/coursebook/Ipc
Multi-level page tales

The lowest \(P\) bits are the offset. Suppose there are \(L\) levels. The higher bits \((X-P)\) are evenly divided into \(L\) segments, in which the \(i\)-th segment is the index used to lookup in the \(i\)-th subtable.

three_address_split.png
Figure 4: From http://cs341.cs.illinois.edu/coursebook/Ipc

Memory lookup: The \(i\)-th table is a map from a level-\(i\) index to a \((i-1)\)-th subtable number. The lowest table is a map from a level-1 index to the offset in physical frame. Sometimes the higher-level (parent) subtables are called directories, while the lower-level subtables are called table. E.g., an entry in the directory maps to a table.

multi_level_page_table_lookup.png
Figure 5: From http://cs341.cs.illinois.edu/coursebook/Ipc

Table size: There is one top-level table and \(2 ^{(X-P) /L}\) \(i\)-th level subtables for \(i = L-1, \dots , 1\). Each table has \(2 ^{(X-P) / L}\) entries, each consuming \((X - P) / L / 8\) byptes (usually rounded up to power of 2). Thus, in total it consumes \([1 + (L-1) \cdot 2 ^{(X-P)/L}] \cdot 2 ^{(X-P)/L} \cdot (X-P) /L /8\) bytes.

8.2. Translation Lookaside Buffer

The more levels of tables, the slower the memory lookup.

Translation Looksaide Buffer (TLB)
A cache of recently-used virtual-page-to-frame lookups. Every time a virtual address needs to be translated into a physical address, TLB is queried in parallel to the page table.

Address per page.

Number of page: / address per page

MMU:

pages / frams does not corres only to ram

8.3. Frames and Page Protections

Page in: Read disk memory into physical frames.

Page out: Copy a physical frame into disk.

Frames and Sharing:

  • Physical memory frames can be shared between multiple processes (no delay in modification on the memory).
  • The page table entries, in addition to storing the frame number, can also store access permissions (read, write, execute) for each mapped page.
  • Read-only frames can be safely shared across processes, allowing for efficient sharing of code segments (e.g., C library code).
  • If a process attempts to write to a read-only page, it will trigger a segmentation fault (SIGSEGV) because the hardware enforces the read-only protection.

Page Bits and Protections: A bit in the page table entry (Not in the virtual address) that record the following information:

Read-only Bit
Marks a page as read-only, causing write attempts to generate a page fault. Used for sharing code segments and implementing copy-on-write.
Execution Bit
Indicates whether the bytes in a page can be executed as CPU instructions. Prevents code injection attacks by marking data pages as non-executable.
Dirty Bit
Tracks whether a page has been modified since it was loaded from disk (set the dirty bit if the page has been writtern to). This requires ??? that the backing store retain a copy of the page after it is paged into memory. If the page is clean, then it is uncessary to write it back to the backing store, vice versa.
Other Bits
Different architectures may have additional page-level bits for various purposes, such as tracking accessed pages, enabling memory protection keys, or implementing hardware-assisted virtualization.

8.4. Page Faults

Minor

This occurs when a process tries to access a valid virtual memory address that is not currently mapped to any page in physical memory.

It typically happens when a process requests more memory (e.g., via sbrk(2)) but has not written to that memory yet.

The operating system can handle this by allocating a new page frame, loading it into memory, and mapping it to the virtual address requested by the process.

Major

This occurs when a process tries to access a virtual memory address that is mapped to a page that is not currently in physical memory (i.e., the page is swapped out to disk).

The operating system must read the page from disk into a free page frame in physical memory. This is a more expensive operation than a minor page fault, as it involves disk I/O.

Invalid

This occurs when a process tries to access a virtual memory address that is not valid or not accessible to the process (e.g., trying to write to a read-only memory area or accessing memory outside the process’s address space).

The operating system typically responds to an invalid page fault by terminating the process and generating a segmentation fault signal (SIGSEGV).

9. Pipes and Files

9.1. TODO Pipes

Pipes were invented to take the output of one program and feed it to the input of another program.

Capacity of pipes depends on the system.

#include <unistd.h>

Pipes in terminal
cmd1 | cmd2 | ... | cmdn. From left to right, the output of the previous command is fed into the next command.
int pipe(int fildes[2]);

<unist.h>. Creates a pipe and allocates a pair of file descriptors. fildes[0] connects to the read end of the pipe; fildes[1] connects to the write end. Data written to fildes[1] can be read from fildes[0].

If all file descriptors referring to the read end are closed, then the writing process to receive a SIGPIPE signal. The writer may catch this signal.

Closing all the file descriptors referring to the write end is the only way to deliver end-of-file to a reader.

Usage:

int fildes[2];
pipe(fildes);

pit_t p = fork();
if (p > 0) { /* Parent */
    ... // E.g., read from fildes[0]
}
else { /* Children */
    ... // E.g., write to fildes[1]
}

Note that fork() duplicates file descriptors referring to the same objects for the children process, so we will have two open fd’s referring to the read end and the write end, respectively. It is common practice to close the unnecessary ends in the child and parent process to meet the criterion above (all fds are closed).

pipe2()
TODO
Named pipes, mkfifo

int mkfifo(const char *path, mode_t mode); creates a new fifo file with name path. The actual contents of the pipe aren’t printed to the file and read from the file. The OS simply create an unnamed pipe that refers to the named pipe.

In terminal, mkfifo path. Then, echo Hello > fifo will hang until cat fifo is run on another terminal to read the content.

int dup2(int oldfd, int newfd);

Duplicates an existing object descriptor oldfd using the file descriptor number speicified by newfd. If newfd is already in use, close() is called first to deallocated it.

file1      file2
  A          A
  |  \       |
  |     \    X
  |        \ |
oldfd      newfd

9.2. File

long fseek(FIle *stream, long offset, int whence)

Seek to a new position in a stream. The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence. If whence is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is relative to the start of the file, the current position indicator, or end-of-file, respectively.

May have to fflush() to update the position on time.

long ftell(FILE *stream);
Tell current position in a stream;
void rewind(FILE *stream)
sets the file position indicator to the beginning of the file.
ssize_t pwrite(int fildes, const void *buf, size_t nbyte, off_t offset);
Write to the specified position in the file without modifying the file pointer.

How the C library warp a file descriptor (simplified):

  • buffer: to reduce the times of read/write.
  • capacity: maximal capacity of the buffer
  • size: the current usage of the buffer
  • buffering method: No buffering, line buffering, full buffering (only flush when the buffer is completely full)

Reading and Writing binary data:

size_t fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
Reads nitems items, each size bytes long a certain, from stream to ptr.

10. Network

Octet
A group of 8 bits.
Port
A 16-bit number.
  • 0 to 1024: privileged ports. E.g., 80 for HTTP

Only processes with super-user access can listen to ports < 1024.

User Datagram Protocol (UDP) : Connectionless. Low latency is more important than reliability (receiving all of the data).

  • data packets may be dropped and not recieved by the listener
  • data packets may be duplicated
  • data packets may arrive out of order
  • do not build pipe
  • Use case: game, streaming video.

Transmission Control Protocol (TCP) : Connection-based, requiring a handshake at the beginning. Reliability is more important than low latenct. Flow control and retransmission. Stream-based. (Simple) Error checking.

HTTP:

UDP over TCP:

bind associates an abstract network socket created with socket to an actual networking interface/address.

10.1. TCP Client

Client: (1) Writes user input to the server and (2) writes bytes from server to user.

int getaddrinfo(const char *hostname, const char *servname, const struct addrinfo *hints, struct addrinfo **res);
Perform address resolution for a given host and service. Creates a linked-list of addrinfo structs and sets the given pointer res to point to the first one.
  • hostname: Either a host name or a numeric host address string. NULL for server bind (bind to any available network interface on the host).
  • servname: Either a decimal port name as a string or a service name.
  • hints: A struct addrinfo that indicate the type of socket to use.

    struct addrinfo {
        int ai_flags;           /* input flags */
        int ai_family;          /* protocol family for socket */
        int ai_socktype;        /* socket type */
        int ai_protocol;        /* protocol for socket */
        socklen_t ai_addrlen;   /* length of socket-address */
        struct sockaddr *ai_addr; /* socket-address for socket */
        char *ai_canonname;     /* canonical name for service location */
        struct addrinfo *ai_next; /* pointer to next in list */
    };
    
    • ai_family: The protocol family that should be used. AF_INET for IPv4, AF_INET6 for IPv6.
    • ai_socktype: Denotes the type of socket that is wanted. SOCK_STREAM for TCP, SOCK_DGRAM for UDP. When ai_socktype is zero the caller will accept any socket type.
  • Return value: 0 if successfull and non-zero as error code. Call gai_strerror() on the error code to get the error information.
int socket(int domain, int type, int protocol);
#include <sys/socket.h>. Create a network socket and returns a descriptor that can be used with read and write.
int connect(int socket, const struct sockaddr *address, socklen_t address_len);
Attempts the connection to the remote machine.
void freeaddrinfo(struct addrinfo *res);
Frees the memory that was allocated for res.

Setting up a client:

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    struct addrinfo hints, * result;
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_INET; // for IPv4, and AF_INET6 fr IPv6
    hints.ai_socktype = SOCK_STREAM; // TCP
    int s = getaddrinfo("illinois.edu", "80", &hints, &result);
    if (s != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s));
        exit(1);
    }

    int sock_fd = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
    if (sock_fd == -1) {
        perror("socket failed");
        exit(1);
    }

    int connect_result = connect(sock_fd, result->ai_addr, result->ai_addrlen);
    if (connect_result == -1) {
        perror("connect failed");
        exit(1);
    }

    printf("Connected\n");


    // do something


    freeaddrinfo(result);
    return 0;
}
Reading

It’s possible that the data arrives in smaller chunks than the total amount requested. And the read function may be interrupted by signals or other events, resulting in a partial read. Use a while loop to keep reading.

ssize_t read_all(int sockfd, void *buf, size_t count) {
    size_t total_read = 0;
    char *ptr = (char *)buf;

    while (total_read < count) {
            ssize_t read_bytes = read(sockfd, ptr + total_read, count - total_read);
        if (read_bytes == -1) {
            if (errno == EINTR) {
                continue;
            } else {
                return -1;
            }
        } else if (read_bytes == 0) {
            break;
        }
            total_read += read_bytes;
    }

    return total_read;
}
Writing

Symmetric. Here is a simple version.

ssize_t write_all(int sockfd, const void *buf, size_t count) {
    size_t total_written = 0;
    const char *ptr = (const char *)buf;

    while (total_written < count) {
        ssize_t written = write(sockfd, ptr + total_written, count - total_written);
        if (written == -1) {
            if (errno == EINTR) {
                continue;
            } else {
                return -1;
            }
        }
        total_written += written;
    }

    return total_written;
}

10.2. TCP Server

int bind(int socket, const struct sockaddr *address, socklen_t address_len);

Associates (binds) an abstract socket with an actual network interface and port.

By default, when a server is closed, the port enters a “timed-wait” state and is released after some time. To be able to immediately reuse a port, specify SO_REUSEPORT before bind:

int optval = 1;
setsockopt(socket, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));

bind(socket, ...);
int listen(int socket, int backlog);
Start to receive data. Set the maximum number (backlog) of pending connections that can be queued up on socket.
int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);
Accept a pending connection. Block until there is a customer for it to serve.
int shutdown(int socket, int how);
Shutdown the connection associated with socket according to how.
  • how: SHUT_RD for reading/receiving. SHUT_WR for writing/sending. SHUT_RDWR for reading and writing.
uint16_t ntohs(uint16_t netshort);
#include <arpa/inet.h> Network to host short (= 16 bits). Convert representations from network byte order and host byte order.
Setting up a server
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    struct addrinfo hints, * result;
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;

    int s = getaddrinfo(NULL, "1234", &hints, &result);
    if (s != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s));
        exit(1);
    }

    int sock_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (bind(sock_fd, result->ai_addr, result->ai_addrlen) != 0) {
        perror("bind()");
        exit(1);
    }

    if (listen(sock_fd, 10) != 0) {
        perror("listen()");
        exit(1);
    }

    struct sockaddr_in * result_addr = (struct sockaddr_in *) result->ai_addr;
    printf("Listening on file descriptor %d, port %d\n", sock_fd, noths(result_addr->sin_port));

    while(1) {
        int client_fd = accept(sock_fd, NULL, NULL);
        if (client_fd < 0) {
            perror(NULL);
            exit(1);
        }

        char buffer[1024];
        ssize_t len = read(client_fd, buffer, 999);
        if (len > 0) {
            buffer[len] = '\0';
        }
        write(client_fd, "Hello\n", 6);

        shutdown(client_fd, SHUT_RDWR);
        close(client_fd); // that number can be reused
    }
}

10.3. Non-Blocking / Asynchrous IO

The default IO is blocking, meaning that it will wait until the data is ready before the function returns.

In non-blocking mode:

  • read() will read whatever bytes are available (despite the specified number of bytes to read) and return immediately. If you tried to read any number of bytes but there are no bytes ready, it would return -1 and set errno to either EAGAIN or EWOULDBLOCK.
  • write() will only send bytes it was able to send immediately and return the number of the bytes. Similarly, if you tried to write something but there are no bytes ready, the return value and errno would be the same as read().

Set a non-blocking flag on a file descriptor:

  • For a file:

    int fd = open("file.txt", O_NONBLOCK);
    
  • For a socket:

    int fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);
    
  • Via fcntl (for those sockets that are not created by the above methods):

    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
    

Ways to check that the IO has arrived:

10.3.1. Select

int select(int nfds, fd_set *restrict readfds, fd_set *restrict writefds, fd_set *restrict errorfds, struct timeval *restrict timeout);

Wait for any of the three sets of file descriptors to become ready. Only return the total number of ready file descriptors. Then, the caller need to iterate throught the file descriptors to see which ones are ready.

Advantages: cross-platform.

10.3.2. Epoll

#include <sys/epoll.h>

Tell exactly which descriptors are ready.

int epoll_create(int size);
Create a new epoll instance (a file descriptor). The size argument is ignored but required to be a positive integer. Call close() on it the close it at the end.
struct epoll_event

Then set a epoll_event with the file descriptor of interest:

// fd is the file descriptor of interest
struct epoll_event ev = {.events = EPOLLIN, .data.fd = fd};
  • events member: EPOLLIN for read, EPOLLOUT for write.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
Control interface.
  • op:
    • EPOLL_CTL_ADD: Add an entry (fd and event) to the interest list of epfd. This operation can be called any times.
    • EPOLL_CTL_DEL: Remove an entry from the interest list.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Waits for up to maxevents events on (interest list of) the epoll instance referred to by epfd. The timeout argument specifies the number of milliseconds that epoll_wait() will block.

Return the number of file descriptors ready for the request IO. The ready events will be store as an array starting from events. events[i] points to the i-th event.

Return -1 on error.

Level-triggered mode

epoll_wait() returns whenever the underlying file descriptor has data available for reading or writing.

If you do not read all available data from the file descriptor, the event will continue to trigger subsequent calls to epoll_wait.

Edge-triggered mode

epoll_wait returns only when there is a change in the status of the file descriptor. This means that it returns when new data arrives (no data -> new data) or when space becomes available (unavailable -> available) for writing, but it does not return if there is still data available to read or space to write.

If you do not read all available data or write all available data space, subsequent calls to epoll_wait will not return until there is a change in the status of the file descriptor.

To use the edge-triggered mode, add the flag EPOLLET to the events member using bitwise OR.

Usage: The following example is from man 7 epoll. Assume we have a prepared TCP server socket int listen_sock.

  1. Create an epoll file descriptor

    epollfd = epoll_create1(0);
    if (epollfd == -1) {
      perror("epoll_create1");
      exit(EXIT_FAILURE);
    }
    
  2. Create an event that monitors the server socket listen_sock in level-triggered mode.

    struct epoll_event ev;
    // This file object will be `read` from (connect is technically a read operation)
    ev.events = EPOLLIN;
    ev.data.fd = listen_sock;
    
    // Add the socket in with all the other fds. Everything is a file descriptor
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
      perror("epoll_ctl: listen_sock");
      exit(EXIT_FAILURE);
    }
    
  3. In a loop:
    1. Wait for any events specified in epollfd

      struct epoll_event events[MAX_EVENTS];
      n = epoll_wait(epollfd, events, MAX_EVENTS, -1);
      if (n == -1) {
        perror("epoll_wait");
        exit(EXIT_FAILURE);
      }
      
    2. Iterate the first nfds elements of events.

      If we get an event on the server socket (events[i].data.fd == listen_sock), that means that there is a new client, so we add an event for the client in edge-triggered mode to epollfd.

      Otherwise, that means that the event is on some client socket, which has data ready to be read. In this case, we read from that socket.

      for (int i = 0; i < n; i++) {
        if (events[i].data.fd == listen_sockfd) {
          // New incoming connection
          struct sockaddr_in addr;
          socklen_t addrlen = sizeof(addr);
          int fd = accept(listen_sockfd, (struct sockaddr *)&addr, &addrlen);
          if (fd == -1) {
            perror("accept");
            exit(EXIT_FAILURE);
          }
      
          // Non-blocking mode
          int flags = fcntl(fd, F_GETFL, 0);
          fcntl(fd, F_SETFL, flags | O_NONBLOCK);
      
          // create some variable for this client fd (stores fd, etc.)
          connection_state_t *state = create_connection_state(fd);
      
          struct epoll_event ev;
          ev.events = EPOLLIN | EPOLLET | EPOLLONESHOT;
          ev.data.ptr = state; // pointer to the variable
      
          if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev) == -1) {
            perror("epoll_ctl: add");
            close(fd);
            free(state);
            exit(EXIT_FAILURE);
          }
        } else {
          // Get back the variable
          connection_state_t *state = (connection_state_t *)events[i].data.ptr;
          // process state
        }
       }
      

11. File System

11.1. Pathing

char * realpath(const char *restrict file_name, char *restrict resolved_name);

#include <stdlib.h>. Converts the input file_name to the corresponding absolute canonical path if that path exists.

If resolved_name is NULL, memory will be dynamically allocated and the address will be returned; otherwise, the path will be stored at resolved_name.

11.2. TODO File Contents

Disk block
A portion of the disk that is reserved for storing the contents a file or a directory. Every disk block has the same size, say \(2 ^{B}\) bytes.
inode
Represents a file or directory, containing the metadata about the file and pointers to disk block that hold the file contents.
Superblock
Contains metadata about the inodes and disk blocks.

11.3. Inode

An inode has

Direct block
A disk block that directly store the file content. A direct block can store \(2 ^{B}\) bytes of the file content.
Indirect block
A disk block that holds pointers to direct disk blocks rather the file content. Suppose a pointer consumes \(P\) byptes (\(4P\) bits), then an indirect block holds \(\frac{2 ^{B}}{P}\) pointers and indirectly points to \(\frac{2 ^{B}}{P} \times 2 ^{B}\) bytes of the file content.
Double/triple/k-th indirect block
A \(k\)-th indirect block is a disk block that holds pointers to \((k-1)\)-th indirect disk blocks. It indirectly points to \[ \underbrace{\frac{2 ^{B}}{P}}_{\text{number of pointers}} \times \underbrace{\left( \frac{2 ^{B}}{P} \right) ^{k-1} \times 2 ^{B}}_{\text{bytes a (k-1)-th indirect block points to}} \] byptes of file content.

11.4. Directory

Directory (folder)
A special file whose contents are inode numbers and names of its entries. Some special bits are set in its inode to identity that this is a directory.
Directory entry (dirent)
A entry under a directory. It is also a file, other than
ls -i
List the inode numbers and the entries in the current directory.
opendir()
Open a directory.
readdir()
  • d_name:
  • d_ino:
(no term)
closedir()

11.5. Meta File Information

stat
  • st_mode: Read/Write/Execute. File/directory type.
fstat
lstat

11.6. TODO Linking

Hard link, ln source_file target_file
A hard link is simply an entry in a directory that points to an inode that represents an existing file.
  • Changes made to the any hard link are reflected in all hard links, as they all point to the same data blocks.
  • Removing the original file does not affect hard links, as they maintain a reference to the underlying data until all links are removed.
  • To check the number of hard links to a file, you can also use ls -l and observe the second field. The number of hard links is displayed as the second field of the output.
Symbolic link, ln -s source_file target_file
A symbolic link, also called a soft link, is a file with a special bit set and stores a path to another file.
  • Can read the content of the source file by reading the symbolic link.
  • Read the path the symbolic link contains, use readlink.
  • Deleting or moving the original file/directory does not affect symlinks, but accessing a broken symlink (pointing to a non-existent target) results in an error.
  • To check if a file is a symbolic link, you can use ls -l. Symbolic links will be identified by an l at the beginning of the permissions field.

12. Scheduling

CPU Scheduling
CPU Scheduling involves efficiently selecting processes to run on CPU cores. In busy systems, there are more ready-to-run processes than CPU cores, requiring the system kernel to evaluate which processes to schedule for execution and which to defer.
Pre-emption

the act of temporarily halting the execution of a currently running process to allow the CPU to immediately start executing another process with higher priority.

Preemption ensures that processes with higher urgency or shorter execution times can be promptly attended to, preventing situations like starvation and improving system responsiveness.

12.1. Measures of Efficiency

Arrival time
is the time the process enters the scheduler (CPU may start working on it)
Start time
is the wall-clock start time of the process (CPU starts working on it)
End time
is the end wall-clock of the process (CPU finishes the process)
Run time
is the total amount of CPU time required
Response Time
start time - arrival time. Latency from arrival to CPU start.
Turnaround Time
end time - arrival time. Total time from process arrival to completion.
Wait Time
Turnaround time - run time. Total time a process spends on the ready queue.
Convoy effect
when a CPU-intensive task monopolizes the CPU, delaying I/O-bound processes and degrading overall system performance.

12.2. Scheduling Algorithms

Shortest Job First (SJF)
Selects the shortest total CPU time process first, minimizing wait and response times. Requires knowledge of process runtimes.
Pre-emptive SJF (PSJF)
Similar to SJF but allows pre-emption. Ensures shorter jobs are prioritized but requires frequent context switching.
First Come First Served (FCFS)
Processes executed in arrival order. Simple but susceptible to the Convoy Effect.
Round Robin (RR)

Processes executed in order with a fixed time quantum, ensuring fairness but potentially leading to frequent context switching.

round-robin.jpeg
Priority Scheduling
Processes executed based on priority value, allowing for the prioritization of critical tasks.

13. Error

errno
#include <errno.h>. A global variable that is set by system calls and library functions in the event of an error. It holds an error code (integer) that provides information about the type of error.
perror()
Prints a descriptive error message corresponding to the current value of errno. It takes a string argument (custom message) and prints it followed by the (built-in) error message.
stderror()
Provides a string representation of an error code. It takes an integer argument representing an error code and returns a pointer to a string describing the error.

14. Makefile

  • Compile: make.
  • Clean up the assignment directory: make clean
  • Compile a debugable version that can use gdb on (e.g., show source code)
  • Compile a release version

15. Regrade

malloc_pt1: 66.67 -> 100.00

shell_pt1: 81.48 -> 92.59

  • Quizzes: 15 / 15
  • MP: 43.40 / 45
  • Lab: 16.60 / 17
  • Final: ? / 21
    • 65% -> 90.45
    • 75% -> 92.6
    • 85% -> 94.7