Home Backend Development C++ building a computer in a programming language for kids

building a computer in a programming language for kids

Jul 18, 2024 am 11:27 AM

… or in other words, the dumbest way to get into embedded systems.

Watch it in action here!

The Goal

The goal was simple. Write some code in C or C++, and be able to execute in Scratch. Honestly, I just found the idea pretty funny: one of the fastest programming languages in one of the slowest. I had a feeling it was possible, but I wasn’t quite sure how. In the process, I learned way more about assembly languages, process memory, and executable files than I expected, and I hope you learn something new as well as I recount my journey.

Step 0: Making a Plan

My first idea was to take the code I wrote in C, break it up into pieces, and then put those pieces back together using Scratch. For example, a while loop in C might become a repeat until block in Scratch:

building a computer in a programming language for kids

In order for a C compiler to understand code, it first needs to generate an AST (abstract syntax tree), which is a tree representation of each important symbol in the source code. For example, an opening parenthesis, the name of a variable, or the return keyword might each be converted into distinct nodes. However, after looking at the AST for a simple Fibonacci number program…

building a computer in a programming language for kids

Step 0b: Making a (Better) Plan

Okay, so that was out of the question. But what if instead of trying to re-compile the source code, we went one step down the ladder: assembly? In order for a program to run, it first needs to be compiled down into assembly. On my computer, that’s x86-64asm. Since assembly doesn’t have any complicated nesting structures, classes, or even variables, trying to parse a list of assembly instructions should (in theory) be easier than trying to parse the spaghetti monster of an AST, such as the one above. Here’s the same Fibonacci program but in x86 assembly.

building a computer in a programming language for kids

Oh, brother. Okay, maybe it’s not that bad. How many total instructions are there?

building a computer in a programming language for kids

building a computer in a programming language for kids

building a computer in a programming language for kids

Step 0c: The Complete Plan

Thankfully, x86 isn’t the only assembly language out there. As part of a college class, I learned about MIPS, a type of assembly language (over-simplifying) used in some video game consoles and supercomputers from the ’90s to early 2000s, that still sees some use today. Switching from x86 to MIPS brings the instruction count down from *unknown *to around 50.

building a computer in a programming language for kids

Using a 32-bit version of MIPS, this assembly code can then be converted into machine code, where each instruction is converted into a 32-bit integer that the processor can understand, based on guidelines set by the processor’s architecture. There’s a book on the MIPS instruction set architecture available online, so if I take the machine code, and then emulate exactly what a MIPS processor would do, then I should be able to run my C code in Scratch!

building a computer in a programming language for kids

Now that that’s out of the way, we can get started.

Step 1: We Cannot Yet Get Started

Well, there’s already a problem. Usually if you have an integer and you want to pull out a series of bits from it, you calculate num & mask, where mask is an integer in which each important bit is 1, and each unimportant bit is 0.

   001000 01001 01000 1111111111111100
 & 000000 00000 00000 1111111111111111
--------------------------------------
   000000 0000 000000 1111111111111100
Copy after login

The problem? There is no & operator in Scratch.

Now, I *could *simply go through both numbers bit by bit and check each of the four possible combinations of two bits, but that would be wastefully slow; after all, this will need to be done multiple times for *each *instruction. Instead, I came up with a better plan.

First, I wrote a quick Python script to calculate x & y for every x and every y between 0 and 255.

for x in range(256):
    for y in range(256):
        print(x & y)

0      (0 & 0 == 0)
0      (0 & 1 == 0)
0      (0 & 2 == 0)
...
0      (0 & 255 == 0)
0      (1 & 0 == 0)
1      (1 & 1 == 1)
0      (1 & 2 == 0)
...
254    (255 & 254 == 254)
255    (255 & 255 == 255)
Copy after login

Now, for example, in order to calculate x & y for two 32-bit integers, we can do the following:

  1. Split x and y into four 8-bit integers (or bytes).

  2. Check what first_byte_in_x & first_byte_in_y is by looking in the table generated from the Python script.

  3. Similarly, look up what second_byte_in_x & second_byte_in_y is, and the third bytes, and the fourth bytes.

  4. Take the results of each of these calculations, and put them together to get the result of x & y .

building a computer in a programming language for kids

However, once a MIPS instruction has been cut up into four bytes, we’ll only & the bytes we need. For example, if we only need data from the first byte, we won’t even look at the bottom three. But how do we know which bytes we need? Based on the opcode (i.e. the “type”) of an instruction, MIPS will try to split up the bits of an instruction in one of three ways.

building a computer in a programming language for kids

Putting everything together, below is the Scratch code to extract opcode, $rs, $rt, $rd, shamt, funct, and immediate for any instruction.

building a computer in a programming language for kids

Step 2: A Short Word About Memory

So, how much memory should our processor actually have? And how should we store it? Well, minimum, MIPS processors have 31 general-purpose registers, and one $zero register that is meant to store the number 0 at all times. A register is a location in memory that a processor can access quickly. We can represent these 32 registers as a list with 32 items in Scratch. As for the rest of the memory, simulating a processor moving chunks of data in and out of its cache in Scratch would be pretty pointless and would actually slow things down, rather than speed them up. So instead, the physical memory will be represented as five lists containing 131,072 elements each, where each element will be a 32-bit integer, giving us about 2.6MB of memory. A contiguous block of memory like these lists is usually called a “page”, and the size of the data that the instruction set works with (in this case 32 bits) is usually called a “word”.

building a computer in a programming language for kids

Step 3: Visits from a Magical ELF

So, how do we get machine code in here? We can’t just import a file into Scratch. But we *can *import text! So, I wrote a program in C to take a binary executable file, and convert every 32 bytes of the file into an integer. C, by default, was reading each byte in little-endian, so I had to introduce a function to flip the endianness. Then, I can save the machine code of a program as a text file (a list of integers), and then import it into my proc:memory:program variable.

#include <stdio.h>

unsigned int flip_endian(unsigned int value) {
    return ((value >> 24) & 0xff) | ((value >> 8) & 0xff00) | ((value << 8) & 0xff0000) | ((value << 24) & 0xff000000);
}

int main(int argc, char* argv[]) {
    if (argc != 3 && argc != 2) {
        printf("Usage: %s <input file> <output file?>\n", argv[0]);
        return 1;
    }

    FILE* in = fopen(argv[1], "r");
    if (!in) {
        perror("fopen");
        return 1;
    }

    unsigned int value;

    FILE* out = argc == 3 ? fopen(argv[2], "w") : stdout;
    if (!out) {
        perror("fopen");
        return 1;
    }

    while (fread(&value, sizeof(value), 1, in) == 1) {
        fprintf(out, "%u\n", flip_endian(value));
    }

    fclose(in);
    if (out != stdout) {
        fclose(out);
    }
    return 0;
}
Copy after login

building a computer in a programming language for kids

Okay, so now that we can import the data into Scratch, we can just set the program counter (the integer keeping track of the current instruction) to the top of the list, and start executing instructions, right?

Wrong.

I didn’t realize this going into this project, but the first several bytes of an executable file *aren’t *instructions, but a header identifying what type of executable file it is. On Windows, it’ll usually be the PE, or Portable Executable, format, and on UNIX-based systems (the version we’ll be using) it’ll be the ELF format. So, how do we actually know where the code starts? On Linux, we can use the builtin readelf utility to actually see what’s in the ELF header, and the Linux Foundation has a page detailing the ELF header standard. So, we can use the LF page to figure out which bytes mean what, and the readelf command to “check our work”.

 $ readelf -h fibonacci
ELF Header:
  Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x4012cc
  Start of program headers:          52 (bytes into file)
  Start of section headers:          7596 (bytes into file)
  Flags:                             0x1001, noreorder, o32, mips1
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         5
  Size of section headers:           40 (bytes)
  Number of section headers:         14
  Section header string table index: 13
Copy after login

building a computer in a programming language for kids

Now, there’s a lot of really interesting stuff here, but to save some time, the *really *important data here (besides the entry point, of course) are the section headers. Oversimplifying greatly, in order for our program to run correctly, we need to take certain chunks of the file and place them in certain parts of memory so our code can access them.

building a computer in a programming language for kids

Using the readelf utility, we can actually see all of the sections in the file:

 $ readelf -S fibonacci
There are 14 section headers, starting at offset 0x1dac:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al     
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .MIPS.abiflags    MIPS_ABIFLAGS   004000d8 0000d8 000018 18   A  0   0  8
  [ 2] .reginfo          MIPS_REGINFO    004000f0 0000f0 000018 18   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            00400108 000108 000024 00   A  0   0  4
  [ 4] .text             PROGBITS        00400130 000130 001200 00  AX  0   0 16
  [ 5] .rodata           PROGBITS        00401330 001330 000020 00   A  0   0 16
  [ 6] .bss              NOBITS          00411350 001350 000010 00  WA  0   0 16
  [ 7] .comment          PROGBITS        00000000 001350 000029 01  MS  0   0  1
  [ 8] .pdr              PROGBITS        00000000 00137c 000440 00      0   0  4
  [ 9] .gnu.attributes   GNU_ATTRIBUTES  00000000 0017bc 000010 00      0   0  1
  [10] .mdebug.abi32     PROGBITS        00000000 0017cc 000000 00      0   0  1
  [11] .symtab           SYMTAB          00000000 0017cc 000380 10     12  14  4
  [12] .strtab           STRTAB          00000000 001b4c 0001db 00      0   0  1
  [13] .shstrtab         STRTAB          00000000 001d27 000085 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)
Copy after login

Going through all the details of the ELF format could be its own multi-part write-up, but using the Linux Foundation page on section headers, I was able to decipher the section header bytes of the program, and copy all the important bytes from the proc:memory:program variable to the correct places in memory, by checking whether or not the section header had the ALLOCATE flag set.

building a computer in a programming language for kids

Step 4: Cycling Through Instructions

Fast-forwarding about a week to the point where all of the important instructions have been implemented, let’s take a look at the steps the processor (or really, any processor) needs to take in order to understand just one instruction, using 0x8D02002A (2365718570) as an example.

The first step is called **INSTRUCTION FETCH. **The current instruction is retrieved from the address stored in the proc:program_counter variable.

The next step is INSTRUCTION DECODE, where the instruction is decoded into its separate parts (see Step 1).

Finally, we reach EXECUTE, which, in my Scratch processor, is pretty much just a big if statement.

In this case, the INSTRUCTION DECODE step revealed that the opcode is 35, which means 0x8D02002A is a lw (load word) instruction. Therefore, based off the values in proc:instr:rs, proc:instr:rt, and proc:instr:immediate, the instruction 0x8D02002A actually means lw $2, 0x2a($8) , or in other words, lw $v0, 42($t0).

The Scratch routines for Instruction Fetch, Instruction Decode, and Execute.

And here is the code that handles the lw instruction:

building a computer in a programming language for kids

Step 5: Hello, World?

Okay, home stretch. Now, we just need to be able to do the bare minimum and create a “Hello, World” program in C, and run it in Scratch, and the last two weeks of my life will have been validated.

So, will this work?

#include <stdio.h>

int main() {
  printf("Hello, world!");
  return 0;
}
Copy after login

Three changes. First of all, the MIPS linker uses start to find the entry point of the program, much the same way you use main in C, or "main__" in Python. So, that’s an easy fix.

#include <stdio.h>

int __start() {
  printf("Hello, world!");
  return 0;
}
Copy after login

Next, we need some way to actually see this output in Scratch. We *could *make some intricate array of text sprites, but the simpler solution is just to use a list.

building a computer in a programming language for kids

Finally, we can’t use stdio.h.

Yeah, basically, implementing floating point registers and multiprocessor instructions would have been more trouble than it was worth, so I skipped it, but the standard library kind of expects all that to be there. So, we need to make printf ourselves.

Putting the complications of variadic arguments and text formatting aside, how can you actually print a string using MIPS? The TL;DR is you put the address of the string in a certain register, and then a special “print string” value in another register, and then execute the syscall (“system call”) instruction, and let the OS/CPU handle the rest.

The exact special values and registers to use are implementation-dependent, and can be implemented pretty much any way you see fit, but I chose to replicate MARS’ (a very popular MIPS simulator) implementation. With MARS, the address of the string goes in $a0, and the value 4 goes in $v0 to say “hey, I want to print a string!”

building a computer in a programming language for kids

And with C, we can use a feature called “inline assembly” to inject assembly code directly into our compiled output. Putting it all together we get this:

#define puts _puts

void _puts(const char *s) {
 __asm__(
  "li $v0, 4\n"
  "syscall\n"
  :
  : "a"(s)
 );
}

int __start() {
 puts("Hello, World!\n");
 return 0;
}
Copy after login

And when we run it, we get this:

building a computer in a programming language for kids

Conclusion

You can view the final product here: https://scratch.mit.edu/projects/1000840481/.

I wanted to keep this read under 15 minutes, so I had to skip over **a lot **of details. Some parts of the Scratch code had to be cut out of the screenshots for simplicity’s sake and I ran into a lot of silly and not-so-silly mistakes. If you’re curious how I was able to get Connect Four working (with minimax and alpha-beta pruning), the source code is on my Github. Here’s a quick list of some of the other problems I ran into in development:

 * The fact that my computer is little-endian, but MIPS is big-endian caused more issues than I'd like to admit
 * The `mult` instruction in MIPS is 32-bit multiplication, and multiplying two 32-bit integers can result in a 64-bit integer.
   Javascript (and as a result, Scratch) is incapable of storing a 64-bit integer without losing precision.
 * The `u` in the `addu` instruction and the `u` in the `sltu` instruction both stand for "unsigned", but mean completely different things.
 * As you may have noticed, functions in Scratch don't have return values. This was quite annoying.
 * Any branch instruction (like "jump", "jump register", "branch on equals") in MIPS will also execute the instruction immediately after it, **regardless** of if the branch was taken or not.
   So, instead of updating the program count directly, the next address needs to be put in the "branch delay slot" and the program counter should only be updated after the *next* instruction.
 * Lists in Scratch are one-indexed.
 * All of a sudden, Scratch stopped letting me save the project to the cloud. It took awhile before I realized that lists filled with over 100,000 items wasn't something Scratch's servers were particularly excited to store.
 * I had to design my own `malloc` in C, which was fun, but also very difficult to debug in Scratch.
 * When I tried making syscalls that asked the user for input, all of the letters ended up capitalized. It turns out that in Scratch a lowercase `"a"` and a capital `"A"` are considered equal.
   I thought this was an unsolvable problem for awhile, before I realized that the names of sprites' costumes in Scratch are actually case-sensitive.
   So every time I try to convert a character to its ASCII value, I tell the processor sprite to switch to, for example, the `"a"` costume or the `"A"` costume, and then retrieve the costume number.
 * I made another syscall to print emojis to the `stdout`, but some emojis are considered two characters long and other emojis are considered one character long.
 * Compiling any code that calls `malloc` with -O1 crashes the CPU. I still have no idea why this is the case.
 * Endianness is really hard to get right. I know I said this in the beginning of the list, but it's worth repeating.
Copy after login

With all that said, I’m really happy with the way this project turned out. If you found this interesting, please check out sharc, my graphics engine built completely in Typescript: https://www.sharcjs.org. Because clearly, if there’s one thing I know how to make, it’s questionable decisions.

The above is the detailed content of building a computer in a programming language for kids. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1675
14
PHP Tutorial
1278
29
C# Tutorial
1257
24
C# vs. C  : History, Evolution, and Future Prospects C# vs. C : History, Evolution, and Future Prospects Apr 19, 2025 am 12:07 AM

The history and evolution of C# and C are unique, and the future prospects are also different. 1.C was invented by BjarneStroustrup in 1983 to introduce object-oriented programming into the C language. Its evolution process includes multiple standardizations, such as C 11 introducing auto keywords and lambda expressions, C 20 introducing concepts and coroutines, and will focus on performance and system-level programming in the future. 2.C# was released by Microsoft in 2000. Combining the advantages of C and Java, its evolution focuses on simplicity and productivity. For example, C#2.0 introduced generics and C#5.0 introduced asynchronous programming, which will focus on developers' productivity and cloud computing in the future.

C# vs. C  : Learning Curves and Developer Experience C# vs. C : Learning Curves and Developer Experience Apr 18, 2025 am 12:13 AM

There are significant differences in the learning curves of C# and C and developer experience. 1) The learning curve of C# is relatively flat and is suitable for rapid development and enterprise-level applications. 2) The learning curve of C is steep and is suitable for high-performance and low-level control scenarios.

What is static analysis in C? What is static analysis in C? Apr 28, 2025 pm 09:09 PM

The application of static analysis in C mainly includes discovering memory management problems, checking code logic errors, and improving code security. 1) Static analysis can identify problems such as memory leaks, double releases, and uninitialized pointers. 2) It can detect unused variables, dead code and logical contradictions. 3) Static analysis tools such as Coverity can detect buffer overflow, integer overflow and unsafe API calls to improve code security.

C   and XML: Exploring the Relationship and Support C and XML: Exploring the Relationship and Support Apr 21, 2025 am 12:02 AM

C interacts with XML through third-party libraries (such as TinyXML, Pugixml, Xerces-C). 1) Use the library to parse XML files and convert them into C-processable data structures. 2) When generating XML, convert the C data structure to XML format. 3) In practical applications, XML is often used for configuration files and data exchange to improve development efficiency.

How to use the chrono library in C? How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

The Future of C  : Adaptations and Innovations The Future of C : Adaptations and Innovations Apr 27, 2025 am 12:25 AM

The future of C will focus on parallel computing, security, modularization and AI/machine learning: 1) Parallel computing will be enhanced through features such as coroutines; 2) Security will be improved through stricter type checking and memory management mechanisms; 3) Modulation will simplify code organization and compilation; 4) AI and machine learning will prompt C to adapt to new needs, such as numerical computing and GPU programming support.

C  : Is It Dying or Simply Evolving? C : Is It Dying or Simply Evolving? Apr 24, 2025 am 12:13 AM

C isnotdying;it'sevolving.1)C remainsrelevantduetoitsversatilityandefficiencyinperformance-criticalapplications.2)Thelanguageiscontinuouslyupdated,withC 20introducingfeatureslikemodulesandcoroutinestoimproveusabilityandperformance.3)Despitechallen

C# vs. C  : Memory Management and Garbage Collection C# vs. C : Memory Management and Garbage Collection Apr 15, 2025 am 12:16 AM

C# uses automatic garbage collection mechanism, while C uses manual memory management. 1. C#'s garbage collector automatically manages memory to reduce the risk of memory leakage, but may lead to performance degradation. 2.C provides flexible memory control, suitable for applications that require fine management, but should be handled with caution to avoid memory leakage.

See all articles