Haxor VM

Gem Version Code Climate

Haxor consists of compiler hc, linker hld and virtual machine hvm. hc translates asm-like code into tokens, hld links them into bytecode, while hvm runs it.

Man, why have you written that?

Writing own implementation of VM gives a lot of knowledge about computer's architecture. You hit into issues not known during day by day activity in high level languages you use. So... just to broaden horizons and for fun ;)

Usage

Compilation:

hcc program.hax

Linking:

hld -o program.hax.e program.hax.u
hld -s 4096 -o program.hax.e program.hax.u # custom stack size

Run:

hvm program.hax.e

License

Haxor is licensed under BSD 3-clause license. You can read it here.

Architecture

General information

  • Design: RISC
  • Endianness: Little Endian
  • WORD size: 64-bit
  • Registers size: 64-bit
  • Instruction: fixed size, 64-bit
  • Arithmetic: integer only, 64-bit
  • Memory model: flat, no protection

OpCodes

Instruction is 64-bit, and contains:

  • 0:6 bits - instruction code (7 bits, unsigned)
  • 7:8 bits - flags (2 bits, unsigned, not used at the moment)
  • 9:14 bits - register 1 (6 bits, unsigned)
  • 15:20 bits - register 2 (6 bits, unsigned)
  • 21:26 bits - register 3 (6 bits, unsigned)
  • 27:63 bits - immediate value (37 bits, signed)

vCPU

vCPU has 64 registers, some of them have special role:

  • $0 - always zero register, writes are ignored
  • $61 (alias $sp) - stack pointer
  • $62 (alias $ret) - return address for linked jumps
  • $63 (alias $sc) - syscall function id and return code

Memory map

Language

Haxor uses primitive asm-like syntax. Each command goes into separate line. You can add comments in code, but they also need to be separate lines, beginning from #. Program starts from main label. Labels are created by putting name and color on the end of line (e.g. main:).

Most of instructions take 3 registers or 2 registers and immediate value. If not stated differently result goes to first specified register.

Instructions

Native instructions

Syntax OpCode Description
nop 0x00 Does nothing.
exiti imm 0x01 Closes VM with specified exit code.
syscall 0x02 Performs Syscall with ID stored in $sc register.
add reg1, reg2, reg3 0x10 reg1 = reg2 + reg3
addi reg1, reg2, imm 0x11 reg1 = reg2 + imm
sub reg1, reg2, reg3 0x12 reg1 = reg2 - reg3
mult reg1, reg2, reg3 0x13 reg1 = reg2 * reg3
div reg1, reg2, reg3 0x14 reg1 = reg2 / reg3
mod reg1, reg2, reg3 0x15 reg1 = reg2 % reg3
lw reg1, reg2, imm 0x20 reg1 = memory[reg2 + imm]
sw reg1, imm, reg2 0x21 memory[reg1+imm] = reg2
lui reg1, imm 0x22 reg1 = (imm << 32)
and reg1, reg2, reg3 0x30 reg1 = reg2 & reg3
andi reg1, reg2, imm 0x31 reg1 = reg2 & imm
or reg1, reg2, reg3 0x32 reg1 = reg2 \
ori reg1, reg2, imm 0x33 reg1 = reg2 \
xor reg1, reg2, reg3 0x34 reg1 = reg2 ^ reg3
nor reg1, reg2, reg3 0x35 reg1 = ~(reg2 \
slt reg1, reg2, reg3 0x36 reg1 = reg2 < reg3
slti reg1, reg2, imm 0x37 reg1 = reg2 < imm
slli reg1, reg2, imm 0x40 reg1 = reg2 << imm
srli reg1, reg2, imm 0x41 reg1 = reg2 >> imm
sll reg1, reg2, reg3 0x42 reg1 = reg2 << reg3
srl reg1, reg2, reg3 0x43 reg1 = reg2 >> reg3
beq reg1, reg2, imm 0x50 goto imm if reg1 == reg2
beql reg1, reg2, imm 0x51 $ret = pc, goto imm if reg1 == reg2
bne reg1, reg2, imm 0x52 goto imm if reg1 != reg2
bnel reg1, reg2, imm 0x53 $ret = pc, goto imm if reg1 != reg2
j imm 0x54 goto imm
jr reg1 0x55 goto reg1
jal imm 0x56 $ret = pc, goto imm

Pseudo instructions

Syntax Description
push reg1 Pushes register onto stack
pushi imm Pushes const onto stack
pushm imm Pushes word stored at specified address
pop reg1 Pops value into register
popm imm Pops value into specified address
move reg1, reg2 reg1 = reg2
clear reg1 reg1 = 0
not reg1, reg2 reg1 = ~reg2
ret Jumps to address stored in $ret
b imm Unconditional branch
bal imm Unconditional linked branch
bgt reg1, reg2, imm goto imm if reg1 > reg2
blt reg1, reg2, imm goto imm if reg1 < reg2
bge reg1, reg2, imm goto imm if reg1 >= reg2
ble reg1, reg2, imm goto imm if reg1 <= reg2
blez reg1, imm goto imm if reg1 <= 0
bgtz reg1, imm goto imm if reg1 > 0
beqz reg1, imm goto imm if reg1 == 0

System calls

Using syscall command you can run some system calls provided by Haxor VM. System call number is passed via $sc register, arguments go via stack in reversed order.

printf (01h)

Prints formatted text into file specified by descriptor. Takes 2 or more arguments:

  • file descriptor (1 for standard output, 2 for standard error)
  • format string
  • data depending on format string...

Example:

addi $sc, $0, 01h
pushi msg_fmt
pushi 1
syscall

scanf (02h)

Converts data from file specified by descriptor. Remember that memory is not automatically allocated by this function. You need to prepare space before calling this function. Use length limits to avoid buffer overflow (e.g. %100s to take up to 100 characters from string). In case of string your buffer must have 1 element more for closing '0'. Takes 2 or more arguments:

  • file descriptor (0 for standard input)
  • format string
  • addresses in memory to put data into them...

Example:

addi $sc, $0, 02h
pushi answer
pushi format
pushi 0
syscall

random (03h)

Generates random integer from specified range. Arguments:

  • minimum (inclusive)
  • maximum (inclusive)

Generated number is pushed onto stack.

Example:

addi $sc, $0, 03h
pushi 100
pushi 1
syscall