Notepad: Win32 API Implementation (x86/x64)

NOTEPAD

Abstract

Notepad is a bare-metal implementation of a text editor for the Microsoft Windows operating system, written entirely in Macro Assembler (MASM). Unlike standard software development involving high-level abstractions (C#, C++, Python), this project interacts directly with the Win32 API and the CPU registers, bypassing the C Runtime (CRT) entirely.

This repository serves as a reference implementation for systems programmers, malware analysts, and computer science students studying the PE (Portable Executable) format, Windows message loops, and low-level memory management. It demonstrates the dichotomy between legacy x86 (Flat Memory Model) and modern x64 (Microsoft x64 ABI) calling conventions within a single codebase.

Download

Pre-built binaries: notepad.zip

Contains both x86 and x64 executables ready to run on Windows.

Technical Specifications

Build Environment

Component Specification Notes
Assembler ml.exe (x86) / ml64.exe (x64) Microsoft Macro Assembler
Linker link.exe Microsoft Incremental Linker
Subsystem WINDOWS Graphical User Interface (GUI)
Entry Point start Custom entry, no main() wrapper
Resource Compiler rc.exe Compiles menus, icons, and manifests

Core Dependencies (Kernel-Level)

The application relies strictly on standard dynamic link libraries found in all Windows versions since XP:

  • kernel32.dll: Memory allocation (HeapAlloc/HeapFree), File I/O (CreateFile, ReadFile, WriteFile), Process control
  • user32.dll: Window creation (CreateWindowEx), Message loop (GetMessage), Clipboard interaction
  • gdi32.dll: Font rendering and graphics device interface contexts
  • comdlg32.dll: Common Dialogs (Open File, Save File, Print, Find/Replace)
  • shell32.dll: Shell functions and file path operations
  • shlwapi.dll: Shell Lightweight API (PathFindFileName for title display)
  • comctl32.dll: Common controls (Status Bar)
  • riched20.dll: RichEdit 2.0 control for advanced text editing

Architecture & Internals

The application implements a standard Windows Event-Driven Architecture. It does not poll for input; rather, it yields CPU time until the Operating System pushes a message to the thread's message queue.

1. The Message Loop (The Heartbeat)

The entry point initializes the WNDCLASSEX structure and spawns the main window. It then enters an infinite loop, consuming approximately 0% CPU when idle.

; Pseudo-assembly representation of the core loop (x64)
MessageLoop:
    mov     rcx, OFFSET msg
    xor     rdx, rdx
    xor     r8, r8
    xor     r9, r9
    call    GetMessage          ; Blocking call, waits for OS event
    test    eax, eax
    jz      ExitProgram         ; WM_QUIT received

    ; Modeless Dialog Handling (Find/Replace)
    mov     rcx, hFindReplaceDlg
    mov     rdx, OFFSET msg
    call    IsDialogMessage     ; Checks if msg belongs to Find/Replace dialog
    test    eax, eax
    jnz     MessageLoop         ; If handled, skip Dispatch

    call    TranslateMessage    ; Virtual-Key -> character
    call    DispatchMessage     ; Route to WndProc
    jmp     MessageLoop

2. Dual-Architecture Logic (x86 vs x64)

The codebase highlights critical differences in assembly programming between 32-bit and 64-bit modes.

x86 (32-bit Protected Mode)

  • Calling Convention: STDCALL. Arguments are pushed onto the stack in reverse order. The callee cleans the stack (ret n).
  • Registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP.
  • Memory Addressing: 32-bit absolute or relative.
  • MASM Syntax: Uses invoke macro for simplified API calls.

x64 (Long Mode)

  • Calling Convention: Microsoft x64 ABI (FASTCALL variant).
    • First 4 integer arguments passed in RCX, RDX, R8, R9.
    • Floating point args in XMM0 - XMM3.
    • Remaining arguments pushed to stack.
    • Shadow Space: The caller must reserve 32 bytes (0x20) on the stack for the callee to spill registers.
  • Stack Alignment: The stack pointer (RSP) must be aligned to a 16-byte boundary before calling any Windows API function.
  • RIP-Relative Addressing: Data is accessed relative to the current instruction pointer to support position-independent code (PIC).
  • Handles: All handles and pointers are 64-bit (QWORD).

3. Memory Management Implementation

Since malloc and free (C-Runtime) are unavailable, the application interfaces directly with the Windows Heap Manager via kernel32:

  • Allocation: HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size)
  • Deallocation: HeapFree(hHeap, 0, pMemory)

This is critically used for:

  • File buffers when reading/writing files
  • Text buffers for status bar updates and word counting
  • Temporary storage during word wrap toggle

4. Unicode Support

The application uses Unicode (UTF-16 LE) throughout:

  • All Windows API calls use Wide (W) variants: CreateWindowExW, SendMessageW, etc.
  • File Reading: Detects encoding via BOM (Byte Order Mark):
    • UTF-16 LE (FF FE): Direct load
    • UTF-8 (EF BB BF): Convert via MultiByteToWideChar
    • No BOM: Try UTF-8 first, fallback to ANSI (CP_ACP)
  • File Writing: Always UTF-16 LE with BOM for maximum compatibility

Feature Implementation Detail

A. The RichEdit Control

Instead of using a basic EDIT control, the application uses RichEdit 2.0 (riched20.dll) which provides:

  • Advanced text selection and manipulation
  • Built-in Find/Replace support via EM_FINDTEXTEX
  • Character formatting capabilities
  • Better undo/redo handling

Styles: WS_CHILD | WS_VISIBLE | WS_VSCROLL | ES_MULTILINE | ES_AUTOVSCROLL | ES_NOHIDESEL

For word wrap toggle, horizontal scrolling is added/removed: WS_HSCROLL | ES_AUTOHSCROLL

B. File I/O Pipeline

File operations adhere to strict transactional steps to ensure data integrity:

  1. CreateFile: Opens handle with GENERIC_READ or GENERIC_WRITE
  2. GetFileSize: Determines allocation requirements
  3. Heap Allocation: Dynamic memory request via HeapAlloc
  4. ReadFile / WriteFile: Bulk transfer between disk and memory
  5. Encoding Conversion: BOM detection and MultiByteToWideChar if needed
  6. SetWindowText / GetWindowText: Transfer between memory and the GUI RichEdit control

C. Find & Replace

The search feature uses the Common Dialog Box Library (FindText / ReplaceText) for the UI, with search logic implemented via RichEdit messages:

  • Search: EM_FINDTEXTEX with FINDTEXTEX structure
  • Selection: EM_EXSETSEL to highlight matching text
  • Replace: EM_REPLACESEL for text substitution
  • Wrap Around: Automatic search restart from beginning/end when not found

D. Status Bar

Real-time display of:

  • Current cursor position: Ln X, Col Y
  • Word count (manual counting algorithm)
  • Character count (excluding CR)
  • Line count

Keyboard Shortcuts

Shortcut Action
Ctrl+N New document
Ctrl+O Open file
Ctrl+S Save file
Ctrl+Shift+S Save As
Ctrl+P Print
Ctrl+Z Undo
Ctrl+X Cut
Ctrl+C Copy
Ctrl+V Paste
Ctrl+A Select All
Ctrl+F Find
Ctrl+H Replace
F3 Find Next
Shift+F3 Find Previous
Del Delete selection

Performance & Metrics

Metric Notepad ASM (x64) Notepad ASM (x86) MS Notepad (Win11) VS Code
Disk Usage ~20 KB ~18 KB ~200 KB + Deps ~300 MB
RAM Usage (Idle) ~1.5 MB ~1.2 MB ~12 MB ~400 MB
Startup Time < 10ms < 10ms ~200ms ~2500ms
Dependencies System DLLs only System DLLs only UWP / CRT Electron / Node.js

Note: The tiny memory footprint is due to the lack of garbage collection, JIT compilation, or interpreted runtime environments. The application maps directly to OS pages.

Build Instructions

The project includes a PowerShell build script (build.ps1) that automates the assembly and linking process.

Prerequisites

  • Visual Studio Build Tools (Workload: C++ Desktop Development)
  • Windows SDK (for rc.exe and libraries)
  • PATH must include paths to ml.exe, ml64.exe, rc.exe, and link.exe

Compilation Steps

  1. Clone the repository:

    git clone https://github.com/wesmar/notepad.git
    cd notepad
  2. Run the Build Script:

    .\build.ps1

    The script will:

    • Compile resources (.rc -> .res)
    • Assemble source files (.asm -> .obj)
    • Link object files with libraries into executables
    • Move binaries to bin/ folder
    • Clean up intermediate files
  3. Manual Compilation (x64 Example):

    cd x64
    rc /c65001 notepad.rc
    ml64 /c /Cp /Cx /Zd /Zf /Zi main.asm
    ml64 /c /Cp /Cx /Zd /Zf /Zi file.asm
    ml64 /c /Cp /Cx /Zd /Zf /Zi edit.asm
    link main.obj file.obj edit.obj notepad.res /subsystem:windows /entry:start /out:Notepad_x64.exe /MANIFEST:EMBED /MANIFESTINPUT:notepad.manifest kernel32.lib user32.lib gdi32.lib comdlg32.lib shell32.lib shlwapi.lib comctl32.lib

Scientific & Academic Use Cases

This project is not merely a tool, but a pedagogical instrument for:

  1. Reverse Engineering Training:

    • Analyzing the generated binary in IDA Pro or Ghidra provides a clean "control group" for recognizing standard Win32 patterns without compiler optimization noise
    • Perfect for learning to identify prologue and epilogue sequences manually
  2. Malware Analysis Research:

    • Many malware families use raw API calls to avoid detection by heuristics that look for CRT signatures
    • Understanding how to invoke APIs like CreateFile and HeapAlloc in pure assembly is crucial for analysts
  3. Operating Systems Study:

    • Demonstrates the boundary between User Mode (Ring 3) application logic and Kernel Mode (Ring 0) transitions via system calls (mediated by ntdll.dll / kernel32.dll)

Directory Structure

notepad/
├── bin/                      # Compiled executables
│   ├── Notepad_x86.exe       # 32-bit executable (~18 KB)
│   └── Notepad_x64.exe       # 64-bit executable (~20 KB)
├── x86/                      # 32-bit source files
│   ├── main.asm              # Entry point, WinMain, WndProc
│   ├── file.asm              # File operations (New, Open, Save, Print)
│   ├── edit.asm              # Edit functions (Find, Replace, Status Bar)
│   ├── data.inc              # Data structures, constants, variables
│   ├── proto.inc             # Function prototypes, API declarations
│   ├── notepad.rc            # Resource script (manifest reference)
│   └── notepad.manifest      # Application manifest (DPI awareness, etc.)
├── x64/                      # 64-bit source files
│   ├── main.asm              # Entry point, WinMain, WndProc (x64 ABI)
│   ├── file.asm              # File operations (x64 calling convention)
│   ├── edit.asm              # Edit functions (x64 calling convention)
│   ├── data.inc              # Data structures (64-bit handles, alignment)
│   ├── proto.inc             # Function prototypes (EXTERN declarations)
│   ├── notepad.rc            # Resource script
│   └── notepad.manifest      # Application manifest
├── build.ps1                 # Automated build pipeline
├── LICENSE.md                # MIT License
└── README.md                 # Documentation

Known Limitations

  • Large File Handling: The implementation loads the entire file into RAM. Files larger than available heap space will trigger an allocation failure.
  • Undo/Redo: Relies on the RichEdit control's built-in undo buffer. Complex multi-level undo history is not manually implemented.
  • Print: Basic single-page print implementation. Does not support pagination or print preview.

License

MIT License. Free for academic, personal, and commercial use. Attribution to the original author is appreciated but not mandatory.

Author

Marek Wesolowski


Project Repository: https://github.com/wesmar/notepad