Basic Data Structures in C
This note is intended as a short point of reference for basic data structures in C. It is not intended as a learning material.
The Basic Struct
struct Planet {
char name[40];
int position;
float axis;
};
This creates a blueprint for a Planet structure, that contains a few basic facts a planet might indeed have. Once this blueprint is defined, it’s possible to create variables of this structure type:
struct Planet planet001;
To access/modify structure members, use the dot (.) operator:
#include <stdio.h>
#include <string.h>
struct Planet {
char name[40];
int position;
float axis;
};
void main() {
struct Planet planet001;
strcpy(planet001.name, "Mars");
planet001.position = 4;
planet001.axis = 25.2;
printf("Name, %s\n", planet001.name);
printf("Position, %d\n", planet001.position);
printf("Axis, %.1f\n", planet001.axis);
}
Output:
Name, Mars
Position, 4
Axis, 25.2
A structure can contain arrays:
#include <stdio.h>
#include <string.h>
struct Planet {
char name[40];
int position;
float axis;
};
void main() {
struct Planet planets[3] = {
{"Mars", 4, 25.2},
{"Mercury", 1, 2},
{"Venus", 2, 177}
};
for(int i = 0; i < 3; i++) {
printf("\nPlanet %d:\n", i+1);
printf("Name: %s\n", planets[i].name);
printf("Position: %d\n", planets[i].position);
printf("Axis: %.1f\n", planets[i].axis);
}
}
Output:
Planet 1:
Name: Mars
Position: 4
Axis: 25.2
Planet 2:
Name: Mercury
Position: 1
Axis: 2.0
Planet 3:
Name: Venus
Position: 2
Axis: 177.0
It’s also possible to nest structures inside other structures:
#include <stdio.h>
#include <string.h>
struct Rotation {
char direction[10];
float period;
};
struct Planet {
char name[40];
int position;
float axis;
struct Rotation rotation;
};
void main() {
struct Planet planets[3] = {
{"Mars", 4, 25.2, {"ccw", 24.6}},
{"Mercury", 1, 2, {"ccw", 58.6}},
{"Venus", 2, 177, {"cw", 243}}
};
for(int i = 0; i < 3; i++) {
printf("\nPlanet %d:\n", i+1);
printf("Name: %s\n", planets[i].name);
printf("Position: %d\n", planets[i].position);
printf("Axis: %.1f\n", planets[i].axis);
printf("Rotation (direction, period): (%s, %.1f)\n",
planets[i].rotation.direction,
planets[i].rotation.period);
}
}
Output:
Planet 1:
Name: Mars
Position: 4
Axis: 25.2
Rotation (direction, period): (ccw, 24.6)
Planet 2:
Name: Mercury
Position: 1
Axis: 2.0
Rotation (direction, period): (ccw, 58.6)
Planet 3:
Name: Venus
Position: 2
Axis: 177.0
Rotation (direction, period): (cw, 243.0)
The important note here, is that a structure cannot be nested within itself, as that would be infinitely large. For creating self-referential structures, the way is to use a pointer to the same type of structure.
Basically:
struct A {
struct A inner;
};
Will fail at compile time, because the compiler cannot determine the size of struct A, since it contains a full copy of itself, infinitely nested.
struct A {
struct A *ptr;
};
The compiler is OK with this, because a pointer always has a fixed, known size, so the compiler has no problems with calculating the size of the pointer structure member.
Linked Lists
A linked list is a dynamic data structure, where each element (often called node) contains a piece of data, and also a reference to the next node.
While this sounds somewhat similar to arrays, the big difference is how the data is stored in memory. Arrays store their data in a contiguous block, meaning that any time an element that is not the last element is removed or added, the array will need to move the rest of the elements to fill the gaps, and keep everything sequentially packaged. This operation becomes slower as the size of the array grows.
Linked lists do not store their elements contiguously, meaning that each element is an individual unit, having a record of what the next (and/or previous) element is. This means that when elements are added or removed, only the reference to the next (and/or previous) element havs to be updated. If the linked list has millions of records, this will be just one quick operation. Fast for write heavy operations.
The main disadvantage of plain linked lists are that they tend to be a bit slower to read, since the individual elements can be anywhere in the memory, lowering the chances for a sequential read and cache optimizations.
Linked list are usually better if there will be more writes than reads, and arrays are usually better if there will be more reads than writes.
Very basic implementation of a forward linked list:
#include <stdio.h>
#include <stdlib.h>
struct Node {
int value;
struct Node* next;
};
struct Node* createNode(int value) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
if(newNode == NULL) {
printf("Failed to allocate memory for next node!\n");
exit(1);
}
newNode->value = value;
newNode->next = NULL;
return newNode;
}
void printLL(struct Node* origin) {
struct Node* current = origin;
printf("Linked List: ");
while(current != NULL) {
printf("%d -> ", current->value);
current = current->next;
}
printf("NULL\n");
}
int main() {
struct Node* origin = createNode(10);
struct Node* second = createNode(20);
struct Node* third = createNode(30);
//Connect the nodes
origin->next = second;
second->next = third;
printLL(origin);
//Try to add a new node before the origin
struct Node* newOrigin = createNode(5);
newOrigin->next = origin;
origin = newOrigin;
printLL(origin);
//Free allocated memory
struct Node* current = origin;
struct Node* next;
while(current != NULL) {
next = current->next;
free(current);
current = next;
}
return 0;
}
Output:
Linked List: 10 -> 20 -> 30 -> NULL
Linked List: 5 -> 10 -> 20 -> 30 -> NULL
There’s quite a lot happening here, so let’s break it down
-> define a
Nodestructure, which containsvalue(some integer), andnext, which is a pointer to the nextNodestructure of the same type->
createNode(), allocates memory for a new node, setsvalueto the provided input, setsnextpointer to NULL, returns the newly created node->
printLL(), goes to the origin of the list, traverses from there till the end, while printing outvaluefor each node->
main(), creates three nodes with values 10,20 and 30, connects the nodes to form a linked list, prints it, tries adding a new node at the beginning, and prints again. At the end, it frees the allocated memory.
newOrigin->next = origin;
origin = newOrigin;
This here is basically the main selling point of the linked list. Inserting a new node at the beginning took only a simple update of a pointer. There was no need to otherwise shuffle any data around.
Inserting in the Middle of a Linked List
-> find the node before the point of insertion
-> update two pointers
Adding this function into the above code:
struct Node* insertAfter(struct Node* node, int value) {
if (node == NULL) {
return NULL;
}
struct Node* newNode = createNode(value);
newNode->next = node->next;
node->next = newNode;
return newNode;
}
Call it in main():
insertAfter(second, 25);
printLL(origin);
Output:
Linked List: 10 -> 20 -> 30 -> NULL
Linked List: 5 -> 10 -> 20 -> 30 -> NULL
Linked List: 5 -> 10 -> 20 -> 25 -> 30 -> NULL
Once the insertion point is known, this is a fast O(1) time complexity operation.
Deleting a Node
-> keep track of the previous node
-> bypass the node being deleted
-> free it
Adding this function into the above code:
void deleteAfter(struct Node* node) {
if (node == NULL || node->next == NULL) {
return;
}
struct Node* toDelete = node->next;
node->next = toDelete->next;
free(toDelete);
}
Call it in main():
deleteAfter(origin);
printLL(origin);
Output:
Linked List: 10 -> 20 -> 30 -> NULL
Linked List: 5 -> 10 -> 20 -> 30 -> NULL
Linked List: 5 -> 10 -> 20 -> 25 -> 30 -> NULL
Linked List: 5 -> 20 -> 25 -> 30 -> NULL
Deleting the origin of the linked list would require some special consideration, since it requires updating the list’s origin pointer.
This is just a simple linked list, they often also have a link to the previous node, but that is out of scope of this note.
Stacks
Stacks can usually be implemented with other, more general data structures, but if assurance is needed that nothing will be inserted in, or removed from the middle of the data set, it still has merit. An example of such a situation could be for various “undo” mechanisms, such as the ones found in various text editors.
The idea is that the order in which things are unwound is the reverse order in which they were inserted: if you add 1, 2, 3, 4, 5 into a stack, the order in which you can retrieve them back is 5, 4, 3, 2, 1.
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 5
struct Stack {
int items[MAX_SIZE];
int top;
};
void initializeStack(struct Stack* stack) {
stack->top = -1;
}
int isFull(struct Stack* stack) {
return stack->top == MAX_SIZE - 1;
}
int isEmpty(struct Stack* stack) {
return stack->top == -1;
}
void push(struct Stack* stack, int value) {
if(isFull(stack)) {
printf("Stack Overflow! Cannot push %d\n", value);
return;
}
stack->items[++stack->top] = value;
printf("%d pushed to stack\n", value);
}
int pop(struct Stack* stack) {
if(isEmpty(stack)) {
printf("Stack Underflow! Cannot pop from an empty stack\n");
return -1;
}
return stack->items[stack->top--];
}
void display(struct Stack* stack) {
if(isEmpty(stack)) {
printf("Stack is empty\n");
return;
}
printf("Stack contents: ");
for(int i = 0; i <= stack->top; i++) {
printf("%d ", stack->items[i]);
}
printf("\n");
}
int main() {
struct Stack stack;
initializeStack(&stack);
push(&stack, 1);
push(&stack, 2);
push(&stack, 3);
display(&stack);
printf("Removing element: %d\n", pop(&stack));
printf("Removing element: %d\n", pop(&stack));
display(&stack);
push(&stack, 4);
push(&stack, 5);
push(&stack, 6);
push(&stack, 7);
//Try to push into a full stack
push(&stack, 8);
display(&stack);
return 0;
}
Output:
1 pushed to stack
2 pushed to stack
3 pushed to stack
Stack contents: 1 2 3
Removing element: 3
Removing element: 2
Stack contents: 1
4 pushed to stack
5 pushed to stack
6 pushed to stack
7 pushed to stack
Stack Overflow! Cannot push 8
Stack contents: 1 4 5 6 7
Breaking it down:
struct Stack {
int items[MAX_SIZE];
int top;
};
This stack is basically an array that also has a special marker for the “top”. This special marker is what the code will use later, for identifying the most recently added element in this array. It is initially set to -1 which is just a specific value to say that the stack is empty.
When something gets pushed into the stack, top will be set to the value representative of the position of that something, keeping in mind that arrays begin from a zero index, so max stack position = array size - 1.
When something gets popped (removed) from the stack, top will be decremented by one. This also explains why -1 makes sense. The last (oldest) element in the stack would have a position of 0, so decrementing by one when removing this element gives -1.
Displaying the stack is a normal for loop, where the maximum value to loop to is top.
The main takeaway is that whenever something is removed or added from/to a stack, a knowledge of, and a check against the “top” (latest added element) is the key to making it work.
The Queue
The queue is probably the most intuitive of algorithms, it comes up naturally enough that it gets implemented without needing to give it any special thought, or consideration. For example, deployment automation will want to deploy things in exactly the same order as things were merged in the repository. In simpler words, if 1 2 3 4 5 is added to the queue, then those entries should be retrieved back in exactly the same order.
A queue can also be represented as an array, but in this case, with two special markers, one representing the beginning, and another representing the end.
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 5
struct Queue {
int items[MAX_SIZE];
int first;
int last;
};
// Initialize queue
void initializeQueue(struct Queue* queue) {
queue->first = -1;
queue->last = -1;
}
int isFull(struct Queue* queue) {
return (queue->last + 1) % MAX_SIZE == queue->first;
}
int isEmpty(struct Queue* queue) {
return queue->first == -1;
}
void add(struct Queue* queue, int value) {
if(isFull(queue)) {
printf("Queue is full! Cannot add %d\n", value);
return;
}
if(isEmpty(queue)) {
queue->first = 0;
}
queue->last = (queue->last + 1) % MAX_SIZE;
queue->items[queue->last] = value;
printf("%d added to queue\n", value);
}
int del(struct Queue* queue) {
int item;
if(isEmpty(queue)) {
printf("Queue is already empty!\n");
return -1;
}
item = queue->items[queue->first];
if(queue->first == queue->last) {
// Last element in queue
queue->first = -1;
queue->last = -1;
} else {
queue->first = (queue->first + 1) % MAX_SIZE;
}
return item;
}
void display(struct Queue* queue) {
int i;
if(isEmpty(queue)) {
printf("Queue is empty\n");
return;
}
printf("Queue elements: ");
i = queue->first;
if(queue->first <= queue->last) {
while(i <= queue->last) {
printf("%d ", queue->items[i]);
i++;
}
} else {
while(i < MAX_SIZE) {
printf("%d ", queue->items[i]);
i++;
}
i = 0;
while(i <= queue->last) {
printf("%d ", queue->items[i]);
i++;
}
}
printf("\n");
}
int main() {
struct Queue queue;
initializeQueue(&queue);
add(&queue, 10);
add(&queue, 20);
add(&queue, 30);
display(&queue);
printf("Removed: %d\n", del(&queue));
printf("Removed: %d\n", del(&queue));
display(&queue);
add(&queue, 40);
add(&queue, 50);
add(&queue, 60);
add(&queue, 70);
//Should not be able to add
add(&queue, 80);
display(&queue);
return 0;
}
Output:
10 added to queue
20 added to queue
30 added to queue
Queue elements: 10 20 30
Removed: 10
Removed: 20
Queue elements: 30
40 added to queue
50 added to queue
60 added to queue
70 added to queue
Queue is full! Cannot add 80
Queue elements: 30 40 50 60 70
Breaking down the important points:
struct Queue {
int items[MAX_SIZE];
int first;
int last;
};
With a queue, it’s useful to be able to keep track of what exactly is the first element, as well as the last element. We can only remove the first element, but we’re only able to add on top of the last element, so the expressions needed for manipulating a queue are slightly more tricky than it is for a stack.
int isFull(struct Queue* queue) {
return (queue->last + 1) % MAX_SIZE == queue->first;
}
Checking if the queue is full requires a small trick, to do with wrapping around the end of the queue. If the queue is full, (queue->last + 1) becomes (4 + 1), which is 5.
5 % MAX_SIZE becomes 5 % 5, which evaluates to 0. The expression becomes 0 == queue->first. Since queue->first is 0, the condition 0 == 0 evaluates to true (the queue is full).
Otherwise, if e.g. the queue has one element, (queue->last + 1) becomes (0 + 1), which is 1.
1 % MAX_SIZE becomes 1 % 5, which evaluates to 1. The expression becomes 1 == queue->first. Since queue->first is 0, the condition 1 ==0 evaluates to false (the queue is not full).
int isEmpty(struct Queue* queue) {
return queue->first == -1;
}
This works similarly to how it worked for a stack, except that we now need the first element to have -1 marker
queue->first = (queue->first + 1) % MAX_SIZE;
Again, when it comes to dealing with checks on the first element of the queue, the trick about wrapping around the array comes back. If queue->first is at the very last index of the underlying array (e.g., index 4 when MAX_SIZE is 5), incrementing it by 1 wraps the pointer back around to index 0. In other cases, it grabs the next available element (e.g. if queue->first = 3, the result will be 4).
Final Notes
In practice, it is unlikely that these structures will be useful in their simplest forms as presented here, and almost all practical applications will require a more involved/advanced variety. Moreover, there’s many many types of structures that are not covered in this simple set of notes.