Posted 2023-07-09Software Exploitation

Exploring the fundamentals of RISC-V: Assembly and Shellcode Series - Part 1

In the ever-evolving landscape of computer architecture, RISC-V has emerged as a promising and disruptive force. With its open-source nature and elegant design philosophy, RISC-V has garnered significant attention from both academia and industry alike. Unlike proprietary architectures, RISC-V is an open-source instruction set architecture (ISA) that provides unrestricted access to its specifications. This openness has spurred innovation, encouraging a flourishing ecosystem of developers, researchers, and companies to contribute to its development. Recent statistics indicate a surge in the adoption of RISC-V architecture, serving as a testament to its growing popularity. According to industry reports, the shipment of RISC-V-based devices reached an astounding 1 billion units in 2022 alone, marking a significant milestone for this emerging technology.

Given the growing popularity of RISC-V in the embedded systems market, it becomes crucial to address the potential security risks associated with the increasing number of devices. This blogpost series aims to provide a comprehensive exploration of RISC-V assembly language fundamentals, enabling readers to understand its core concepts and functionalities. We will begin by delving into the basics of RISC-V assembly, laying a solid foundation for subsequent discussions. In future blog posts, we will focus on setting up the development environment and tools required for writing and compiling assembly code. Furthermore, we will explore practical examples such as crafting basic shellcode and testing it on simple buffer overflow vulnerabilities. Additionally, we will dive into the creation of various shellcode variations, including shell spawning and reverse TCP shells, to expand our understanding and practical skills in this domain.

#Brief overview of RISC-V architecture

RISC-V is an open-source instruction set architecture (ISA) that is designed to be simple, modular, and extensible. It was developed at the University of California, Berkeley, and has gained significant attention and adoption in both academia and industry.

Key features of RISC-V architecture include:

Instruction Set: RISC-V follows the Reduced Instruction Set Computing (RISC) philosophy, which means it has a minimalistic and streamlined set of instructions. The base RISC-V instruction set is divided into several standard instruction sets, ranging from a small subset suitable for embedded systems to more comprehensive sets for general-purpose computing.
Bit Widths: RISC-V supports both 32-bit and 64-bit instruction set variants. The 32-bit variant is commonly used in resource-constrained embedded systems, while the 64-bit variant provides increased address space and computational capabilities for general-purpose computing.
Register File: RISC-V has a standard set of general-purpose registers, typically 32 in number for 32-bit variants and 64 for 64-bit variants. These registers are used to hold data during program execution and serve as the primary operands for arithmetic and logical operations.
Memory Model: RISC-V employs a simple and flexible memory model. It supports a flat memory space where data and instructions are stored. Memory accesses are performed through load and store instructions, which transfer data between registers and memory.
Exception Handling: RISC-V provides mechanisms for handling exceptions, such as interrupts, system calls, and other events that require special attention. The architecture defines a set of exception codes and specifies how the processor should respond to these events.
Privilege Levels: RISC-V supports multiple privilege levels to enforce different levels of access and protection. These levels include user mode, supervisor mode, and machine mode. Each privilege level has its own set of instructions and privileges.

#Instruction Format/Encoding

RISC-V instructions are divided into different formats based on their structure and operand types. The most common formats include:

R-type: Used for arithmetic and logical operations, which involve two source registers and one destination register.
I-type: Used for immediate operations, where one operand is a register and the other is an immediate value.
S-type: Used for store operations, which store data from a register into memory.
B-type: Used for branch operations, which perform conditional jumps based on comparison results.
U-type: Used for unconditional jumps and instruction-level constants.
J-type: Used for jump operations with a signed immediate offset.

#Essential RISC-V Assembly Instructions

#Arithmetic and Logical instructions

Basic RISC-V instructions cover a range of operations such as arithmetic, logical, memory access, and control flow. Here are some examples of common RISC-V instructions along with their syntax:

Arithmetic Instructions:
1. ADD: Adds two registers and stores the result in a destination register.
  Syntax: ADD rd, rs1, rs2
2. SUB: Subtracts one register from another and stores the result in a destination register.
  Syntax: SUB rd, rs1, rs2
3. ADDI: Adds an immediate value to a register and stores the result in a destination register.
  Syntax: ADDI rd, rs1, imm
Logical Instructions:
1. AND: Performs bitwise AND between two registers and stores the result in a destination register.
  Syntax: AND rd, rs1, rs2
2. OR: Performs bitwise OR between two registers and stores the result in a destination register.
  Syntax: OR rd, rs1, rs2
3. XOR: Performs bitwise XOR between two registers and stores the result in a destination register.
  Syntax: XOR rd, rs1, rs2
Memory Access Instructions:
1. LW: Loads a word from memory into a register.
  Syntax: LW rd, offset(rs1)
2. SW: Stores a word from a register into memory.
  Syntax: SW rs2, offset(rs1)
Control Transfer Instructions:
1. JAL: Jumps to a target address and stores the return address in a register.
  Syntax: JAL rd, target
2. JALR: Jumps to a target address with a register offset and stores the return address in a register.
  Syntax: JALR rd, rs1, offset
3. BEQ: Branches to a target address if two registers are equal.
  Syntax: BEQ rs1, rs2, target
4. BNE: Branches to a target address if two registers are not equal.
  Syntax: BNE rs1, rs2, target

These examples represent just a subset of the basic instructions available in RISC-V. The syntax follows a common pattern where rd represents the destination register, rs1 and rs2 are the source registers, imm denotes an immediate value, and offset specifies an offset from a base register.

It’s important to consult the official RISC-V documentation and specific implementation’s instruction set reference for a comprehensive list of instructions and their precise syntax, as it may vary based on the specific RISC-V variant or extension being used.

#Load and Store Instruction

Load and store instructions in RISC-V are used to transfer data between the processor’s registers and memory. These instructions play a vital role in reading and writing data, facilitating data manipulation and program execution. Here’s an explanation of load and store instructions in RISC-V:

Load Instructions: Load instructions fetch data from memory and store it in a register. They allow the processor to access data stored in memory for subsequent processing or use. Common load instructions in RISC-V include:
1. LW (Load Word): Loads a 32-bit word from memory into a register.
  Syntax: LW rd, offset(rs1)
  Example: LW x1, 0(x2)
2. LH (Load Halfword): Loads a 16-bit halfword from memory into a register.
  Syntax: LH rd, offset(rs1)
  Example: LH x3, 4(x4)
3. LB (Load Byte): Loads an 8-bit byte from memory into a register.
  Syntax: LB rd, offset(rs1)
  Example: LB x5, -8(x6)
  Load instructions typically require specifying the destination register (rd), the memory address obtained by adding an immediate offset to the base register (offset(rs1)).
Store Instructions: Store instructions transfer data from registers to memory. They allow the processor to write data back to memory for storage or output purposes. Common store instructions in RISC-V include:
1. SW (Store Word): Stores a 32-bit word from a register into memory.
  Syntax: SW rs2, offset(rs1)
  Example: SW x7, 16(x8)
2. SH (Store Halfword): Stores a 16-bit halfword from a register into memory.
  Syntax: SH rs2, offset(rs1)
  Example: SH x9, -4(x10)
3. SB (Store Byte): Stores an 8-bit byte from a register into memory.
  Syntax: SB rs2, offset(rs1)
  Example: SB x11, 12(x12)
  Store instructions require specifying the source register (rs2), the memory address obtained by adding an immediate offset to the base register (offset(rs1)).

In load and store instructions, ‘rd’ represents the destination or source register, ‘rs1’ is the base register that holds the memory address, ‘offset’ specifies the offset from the base address, and ‘rs2’ is the register containing the data to be stored.

It’s important to note that memory access in RISC-V is typically aligned, meaning that data is accessed on word boundaries (32-bit or 4-byte alignment). Unaligned access may lead to performance penalties or even exceptions on certain RISC-V implementations.

Load and store instructions are fundamental for manipulating data in RISC-V programs and are used extensively in various applications, including data processing, data storage, and communication with external devices.

#Control transfer instructions

Control transfer instructions in RISC-V are used to alter the flow of program execution by changing the order of instructions or redirecting the program to a different location. These instructions enable branching, looping, and subroutine calls. Here’s an explanation of control transfer instructions in RISC-V:

Unconditional Jump Instructions: Unconditional jump instructions transfer control unconditionally to a target address. The most commonly used unconditional jump instruction in RISC-V is:
1. JAL (Jump and Link): Jumps to a target address and stores the return address (address of the instruction following the JAL) into a register.
  Syntax: JAL rd, target
  Example: JAL x1, target_label
Conditional Branch Instructions: Conditional branch instructions allow branching based on a specific condition. They evaluate the condition and decide whether to perform a jump or continue with the next sequential instruction. Some common conditional branch instructions in RISC-V are:
1. BEQ (Branch if Equal): Branches to a target address if two registers are equal.
  Syntax: BEQ rs1, rs2, target
  Example: BEQ x2, x3, target_label
2. BNE (Branch if Not Equal): Branches to a target address if two registers are not equal.
  Syntax: BNE rs1, rs2, target
  Example: BNE x4, x5, target_label
3. BLT (Branch if Less Than): Branches to a target address if one register is less than another.
  Syntax: BLT rs1, rs2, target
  Example: BLT x6, x7, target_label
4. BGE (Branch if Greater Than or Equal): Branches to a target address if one register is greater than or equal to another.
  Syntax: BGE rs1, rs2, target
  Example: BGE x8, x9, target_label
Jump and Link Register Instructions: Jump and link register instructions perform a jump to a target address and store the return address in a register. Unlike unconditional jumps, these instructions allow for subroutine calls and enable the program to return to the instruction following the jump. One such instruction in RISC-V is:
1. JALR (Jump and Link Register): Jumps to a target address computed as the sum of a register and an immediate offset and stores the return address in a register.
  Syntax: JALR rd, rs1, offset
  Example: JALR x10, x11, 8

Control transfer instructions are essential for implementing conditional statements, loops, and function calls in RISC-V assembly language. They provide the ability to control the flow of execution and create flexible program structures.

It’s important to consider the branching offsets in control transfer instructions, as they are relative to the current program counter and may need to be calculated accordingly.

Understanding control transfer instructions enables programmers to create complex program structures and implement control flow logic in RISC-V assembly programs.

#System call instructions

System call instructions in RISC-V allow programs to interact with the operating system and request specific services or functionalities. These instructions provide a mechanism for user-level programs to access privileged operations, such as file I/O, process management, and input/output operations. Here’s an explanation of system call instructions in RISC-V:

ECALL (Environment Call): The ECALL instruction is used to invoke a system call and transfer control to the operating system. It provides a way for user-level programs to request services from the underlying operating system. The specific system call number and arguments are typically passed through predefined registers.

Syntax: ECALL
Example:

# Load system call number into register a7 and arguments into other registers
`LI a7, <system_call_number>`
# Perform system call
`ECALL`

#System Call Convention

RISC-V follows a convention for passing system call arguments and receiving results. The arguments for a system call are typically passed in specific registers, such as a0, a1, a2, a3, and so on. The return value of a system call is stored in the a0 register.

The specific mapping of system call numbers to their corresponding services varies depending on the operating system and its RISC-V implementation. Developers should consult the operating system documentation or relevant system call reference for the specific system call numbers and their corresponding functionality.

Examples of System Calls:
The available system calls and their functionality depend on the operating system. Some common system call services include:

File I/O: Opening, reading from, writing to, and closing files.
Process Management: Creating processes, terminating processes, and accessing process-related information.
Input/Output: Reading from and writing to standard input/output.
Memory Management: Allocating and managing memory.
Networking: Performing network-related operations, such as socket creation and data transfer.

The exact system call instructions and their arguments can vary depending on the operating system and the specific RISC-V implementation used.

Understanding system call instructions allows programmers to leverage the services provided by the operating system and access privileged operations from user-level programs. It enables interaction with the underlying system and facilitates the development of complex and feature-rich applications on the RISC-V platform.

#Manipulating Memory in RISC-V Assembly

#Adderessing Mode

Addressing modes in RISC-V determine how memory addresses are calculated for load and store instructions. These modes define how the base address, offset, and index registers are combined to form the effective memory address. RISC-V supports various addressing modes to provide flexibility and efficient memory access. Here’s an explanation of addressing modes in RISC-V:

Immediate Addressing: In immediate addressing mode, the memory address is formed by adding an immediate value to a base register. The immediate value is a constant value specified directly in the instruction.

Example: LW rd, imm(rs1)
In this mode, the memory address is formed by adding the immediate value ‘imm’ to the base register ‘rs1’.
Register Addressing: Register addressing mode uses a register to hold the memory address directly. The memory address is obtained from the contents of the register without any additional calculation.

Example: LW rd, (rs1)
In this mode, the memory address is directly obtained from the register ‘rs1’.
Base or Offset Addressing: Base or offset addressing mode combines a base register and an immediate offset value to calculate the memory address.

Example: LW rd, offset(rs1)
In this mode, the memory address is obtained by adding the immediate offset ‘offset’ to the base register ‘rs1’.
Indexed Addressing: Indexed addressing mode combines a base register, an index register, and an immediate offset to calculate the memory address. The offset is added to the sum of the base register and the index register to form the effective address.

Example: LW rd, offset(rs1, rs2)
In this mode, the memory address is obtained by adding the immediate offset ‘offset’ to the sum of the base register ‘rs1’ and the index register ‘rs2’.

RISC-V addressing modes provide flexibility for accessing memory and enable efficient data retrieval. They allow for direct access, indexing, and offset calculations based on the requirements of the program. The choice of addressing mode depends on the specific memory access pattern, data structure, and optimization considerations.

It’s important to consult the official RISC-V documentation and specific implementation’s instruction set reference for the supported addressing modes and their corresponding syntax, as it may vary based on the specific RISC-V variant or extension being used.

#Working with arrays and data structures in RISC-V assembly

Working with arrays and data structures in RISC-V assembly involves effectively manipulating and accessing elements stored in memory. Arrays and data structures provide a way to organize and represent collections of related data. Here’s an elaboration on working with arrays and data structures in RISC-V assembly:

Array Access: Arrays consist of a contiguous block of elements of the same data type. To access individual elements within an array, you need to calculate the memory address of each element. RISC-V assembly provides addressing modes, such as immediate, register, base+offset, or indexed addressing, to facilitate array access.

Example: Accessing elements of an array in RISC-V assembly:
1
2
Assume the array starts at address array_base nd each element occupies 4 bytes (word)
LW rd, offset(rs1) # Load element at array_base + offset into a register
By incrementing the offset, you can access successive elements in the array.

Data Structures: Data structures, such as structs or records, allow grouping related data of different types into a single unit. In RISC-V assembly, you can access the individual fields of a data structure using byte offsets or structure offsets.

Example: Accessing fields of a data structure in RISC-V assembly:

# Assume the data structure starts at address struct_base
# and field1 occupies 4 bytes, and field2 occupies 2 bytes
# field1 offset = 0, field2 offset = 4 (field2 starts after field1)
LW rd, 0(rs1)   # Load field1 into a register
LH rd, 4(rs1)   # Load field2 into a register

By adding the appropriate byte offset to the base address, you can access specific fields within the data structure.

Iterating Over Arrays and Data Structures: Iterating over arrays and data structures typically involves using loops. You can use branch instructions to implement loops and maintain a loop counter to iterate through the elements or fields.

Example: Looping over an array in RISC-V assembly:

# Assume array_size holds the size of the array
LI t0, 0       # Initialize loop counter
Loop:
BEQ t0, array_size, Exit  # Exit loop when counter reaches array_size
LW rd, t0(rs1)            # Load array element into a register
# ... Do operations with the element ...
ADDI t0, t0, 4            # Increment the loop counter (assuming word-sized elements)
J Loop                   # Jump back to the beginning of the loop
Exit:

By manipulating the loop counter, you can access and perform operations on each element within the array or fields within a data structure.

Working with arrays and data structures in RISC-V assembly requires careful management of memory addresses, appropriate offsets, and loop control. Understanding addressing modes, memory layout, and control flow enables efficient manipulation and traversal of structured data in RISC-V assembly programs.

#Simple Program Example

Here’s a simple program in C that calculates the sum of two numbers:


#include <stdio.h>

int main() {
    int num1 = 10;
    int num2 = 20;
    int sum = num1 + num2;

    printf("The sum is: %d\n", sum);

    return 0;
}

Now, let’s disassemble this C program into RISC-V assembly code using a disassembler:

.file	"program.c"
	.text
	.globl	main
	.type	main, @function
main:
	addi	sp, sp, -16
	sd	ra, 0(sp)
	sd	s0, 8(sp)
	addi	s0, sp, 16
	li	a5, 10
	sw	a5, 0(s0)
	li	a5, 20
	sw	a5, 4(s0)
	lw	a4, 0(s0)
	lw	a5, 4(s0)
	add	a5, a4, a5
	mv	a0, a5
	li	a5, 1
	mv	a1, a5
	li	a5, 0
	la	a6, .LC0
	call	printf
	li	a5, 0
	mv	a0, a5
	ld	ra, 0(sp)
	ld	s0, 8(sp)
	addi	sp, sp, 16
	jalr	ra, 0(ra)
	.size	main, .-main
	.section	.rodata
.LC0:
	.string	"The sum is: %d\n"
	.text
	.align	2
	.ident	"GCC: (GNU) 10.2.0"
	.section	.note.GNU-stack,"",@progbits

The disassembled RISC-V assembly code represents the C program’s functionality translated into the corresponding assembly instructions. You can observe the instructions such as addi, lw, sw, add, mv, li, la, call, and others, which perform operations like loading and storing values, arithmetic calculations, moving values between registers, and invoking system calls like printf.

Note: The specific assembly instructions and register usage may vary depending on the RISC-V toolchain and options used for compilation and disassembly.

Remember that the disassembled code represents the low-level assembly representation of the C program, allowing you to understand the underlying instructions executed by the processor.

#Conclusion

In conclusion, this blog post has presented an introductory overview of RISC-V assembly language, highlighting its fundamental role in programming for RISC-V processors. We explored essential aspects such as the instruction set architecture, registers, instruction formats, and assembly directives. Acquiring a solid understanding of RISC-V assembly empowers developers and researchers to leverage the full potential of RISC-V architectures, optimize system performance, and build secure and efficient systems. Moreover, by delving into the intricacies of RISC-V assembly, security researchers can deepen their comprehension of the inner workings of these processors, enabling them to effectively address the evolving landscape of RISC-V technology and the associated threats it may face. In the next post will look into how to write and compile the assembly code.