Pointers and Arrays in C
These are some short-hand notes about C pointers/arrays as a quick reference. For a proper introduction, K&R is still the best place to start.
What a Pointer Looks Like
Pointers are variables that store memory addresses of other variables. A pointer variable should point to a data type (e.g. int) of the same type, and is denoted with the * operator.
When a pointer is assigned, the result ist a memory address:
#include <stdio.h>
void main() {
int someValue = 1;
int* ptr = &someValue;
printf("%p\n", ptr);
//and if you want to know what value the pointer points to, it must be dereferenced:
printf("%d\n", *ptr);
}
Running it:
root@debian-test:~/# ./prog
0x7ffcdeb7ba5c
1
Memory Addresses and Basic Pointer Operators
When a variable is defined in C, the compiler finds an unused location in memory, and attaches a label to that memory address (the name of the variable). The point is that, there’s no need to remember the memory addresses, it’s enough to remember the label attached to the memory address. It was not necessary to remember the address 0x7ffcdeb7ba5c, it was enough that one could remember the label someValue.
int someValue = 1;
Was given a memory address, but by default, it is not known what this memory address is, without the use of an operator to find out.
With a lot of oversimplifications, the memory itself can be visualized as a big array, in which each element has a unique address, representing that element. With C, we can directly reference memory addresses.
The “Address of” Operator (&)
To know the memory address of where a C variable is stored, the “address of” operator & is available. To use it, place it before a variable name, and it returns the memory address of that variable.
In essence, this operator is necessary in order to allow for pointer values to be conveniently assigned.
Defining Pointer Variables
As can be seen in the example, pointer variables are defined by including an asterisk * between the data type and the name of the variable.
A pointer must be associated with a specific type, even though it actually stores a memory address.
This is because:
-> Different data types occupy different amounts of memory
-> The compiler needs to know how to interpret the data at the address to which the pointer points to
-> Helps reduce type-related errors
A pointer can be defined without an initial value (“uninitialized”), but such a newly defined pointer doesn’t point to anything useful. It’ll contain some garbage value until a value is given. A pointer that is defined but not assigned to a value can be dangerous to use.
The Dereferencing Operator (*)
Basically, this operator says “give me the value at the address stored in this pointer”.
This operation is simple, but also a potential source of application crashes:
-> Dereferencing a pointer with no value assigned (“uninitialized”)
-> Dereferecning with an invalid type cast
-> Dereferencing a pointer to a variable that has since gone out of scope
Special Pointer Types
NULL Pointer
-> Does not point to any memory location.
-> Created by assigning the NULL value to the pointer, which can be of any data type.
-> Useful for checking if a pointer is pointing to a valid memory address by checking against NULL
#include <stdio.h>
void main() {
int *ptr = NULL;
}
Void Pointer
-> Has no associated data type
-> Can be typecasted to any other data type
-> Can not be dereferenced
#include <stdio.h>
void main() {
int someValue = 1;
void *ptr = &someValue;
//printf("%d", *ptr); <- compiler error!
//must first typecast, and then dereference
printf("%d", *(int*)ptr);
}
Wild Pointer
Pointers that have not been assigned a value yet (“uninitialized”). These can cause application crashes, maybe by trying to access a part of the memory space that the application is not allowed to.
#include <stdio.h>
void main() {
int *ptr;
}
Trying to use this pointer = problem.
Dangling Pointer
- -> A pointer which points to a memory address that has been deleted or freed, or the variable that the pointer is pointing to has gone out of scope
#include <stdio.h>
#include <stdlib.h>
void main() {
int* ptr;
//using an anonymous block to create a new scope
{
int someValue = 1;
ptr = &someValue;
}
//ptr here becomes a dangling pointer - this printf will print something else than what we expect
printf("%d", *ptr);
}
Arrays
So far, everything has been about pointers as standalone variables. But in real C programs, pointers show up most often when working with arrays, because array manipulation and pointers go hand in hand. Understanding how arrays and pointers relate requires its own explanation.
On a basic level, a C array is a container that holds items of the same data type, stored in contiguous memory locations. Remembering that pointers are variables to store memory addresses, it’s not hard to see why the two concepts fit together so naturally.
To create an array, specify its type and size:
void main() {
int stuff[5];
stuff[0] = 1;
stuff[1] = 2;
stuff[2] = 3;
stuff[3] = 4;
stuff[4] = 5;
// or... add values during declaration
int otherstuff[5] = {1, 2, 3, 4, 5};
}
C also supports multi-dimensional arrays:
void main() {
interestingStuff[4][3] = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9},
{10, 11, 12}
};
}
How Pointers and Arrays are Related
In C, the name of an array functions like a pointer to the first element of the array.
#include <stdio.h>
void main() {
int arr[3] = {1, 2, 3};
printf("Printing the NAME of an array gives us a memory address: %p\n", arr);
printf("Using the \"address of\" operator on the first element, also gives us a memory address: %p\n", &arr[0]);
printf("We can use array notation to print values: %d\n", arr[0]);
printf("Or pointer notation if using the dereferencing operator: %d\n", *arr);
}
Output:
Printing the NAME of an array gives us a memory address: 0x7fffe6d7064c
Using the "address of" operator on the first element, also gives us a memory address: 0x7fffe6d7064c
We can use array notation to print values: 1
Or pointer notation if using the dereferencing operator: 1
Using Pointers to Access Arrays
It’s possible to traverse through the elements of an array using pointer arithmetic:
#include <stdio.h>
void main() {
int arr[3] = {1, 2, 3};
//Traverse by array notation
printf("Array notation:\n");
for(int i = 0; i < 3; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
//Traverse by pointer notation
printf("\nPointer notation:\n");
for(int i = 0; i < 3; i++) {
printf("*(arr + %d) = %d\n", i, *(arr + i));
}
}
Outputs:
Array notation:
arr[0] = 1
arr[1] = 2
arr[2] = 3
Pointer notation:
*(arr + 0) = 1
*(arr + 1) = 2
*(arr + 2) = 3
The main important thing to note, is that an array name is a constant pointer, and there is no way to change where it points. (Why would you want to, anyway?)
Basically:
int arr[3] = {1, 2, 3};
int *ptr = arr; //correct
arr = ptr; //BAD
Array Arithmetic
When performing arithmetic on pointers in C, the operation will be scaled to make sense for that data type:
#include <stdio.h>
void main() {
int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr;
printf("ptr points to: %d\n", *ptr);
ptr++;
printf("ptr++: ptr points to %d\n", *ptr);
ptr += 2;
printf("ptr += 2: ptr points to %d\n", *ptr);
}
Output:
ptr points to: 1
ptr++: ptr points to 2
ptr += 2: ptr points to 4
For example, this can be used to perform various operations on arrays, e.g. the program to sum up some array elements:
#include <stdio.h>
void main() {
int arr[5] = {1, 2, 3, 4, 5};
int sum = 0;
int *ptr = arr;
for (int i = 0; i < 5; i++) {
sum += *ptr;
ptr++;
}
printf("%d\n", sum);
}
Output:
15
Character Arrays (“strings”)
In C, strings are represented as arrays of characters. This is a big distinction between C and many of the higher level languages, where strings are usually their own primitives.
#include <stdio.h>
void main() {
char str[10] = "something";
printf("%s\n", str);
printf("%c\n", str[0]);
printf("%c\n", *str);
}
Output:
something
s
s
These can be included inside regular arrays:
#include <stdio.h>
void main() {
char *arr[] = {
"thing1",
"thing2",
"thing3"
};
for (int i = 0; i < 3; i++) {
printf("%d, %s\n", i+1, *(arr + i));
}
}
Output:
1, thing1
2, thing2
3, thing3
Important Note:
Modifying a string literal can not be done in the assumed straight-forward way:
char *str = "Test";
str[0] = 'B'; //BAD
Instead think of char *str as if it said const char *str.
For a string that can be modified, initialize it as an array:
char str[] = "Test";
str[0] = 'B'; // OK
Pointer to Array
Pointers can point to an entire array:
#include <stdio.h>
void main() {
int arr[3] = {1, 2, 3};
int (*ptr)[3] = &arr;
printf("%d, %d, %d\n", (*ptr)[0], (*ptr)[1], (*ptr)[2]);
}
Output:
1, 2, 3
Usage
Passing Arrays to Functions
#include <stdio.h>
void printArray(int *arr, int size) {
// int *arr -> int arr[] , functionally equivalent here
// either way, it will be always passed by reference, since we are passing a pointer
for(int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
void main() {
int numbers[3] = {1, 2, 3};
printArray(numbers, 3);
}
Output:
1 2 3
Dynamic Allocation of Memory
In all of the previous examples, arrays were created with a set, known size. This is not usually, or at least not always practical:
#include <stdio.h>
#include <stdlib.h>
void main() {
int size = (rand() % 10);
int *arr = (int*)malloc(size * sizeof(int));
if(arr == NULL) {
printf("Failed to allocate memory...\n");
exit(1);
}
for (int i = 0; i < size; i++) {
*(arr + i) = i; // or... arr[i] = i, like below
printf("%d ", arr[i]);
}
printf("\n");
//Don't forget to free the allocated memory, otherwise you get a memory leak!
free(arr);
}
Output:
0 1 2
Final Notes
->
arr[i]<->*(arr + i)->
&arr[i]<->(arr + i)->
char *str!=char str[]->
function(int arr[])<->function(int* arr)