This routine is a little longer, but it is relocatable. This means that the program can be run from pretty much anywhere within the 64K address space without any modification required. The first thing the program does is calculate where the program itself is located. It uses this address and adds an offset to the hello world string, and echoes it.
I made the hello world string a bit longer. It's now "Hello, world!\n" which is the specific string that many of the hello world programs tend to use. It's now also a null terminated string, a C string. And, there is an extra "cosmetic" newline character at the beginning of the string. The 8th bit is set for each character being echoed to the video display.
:A2 60 86 FF 20 FF 0 BA BD 0 1 85 FF CA BD 0 1 85 FE A0 18 D0 4 20 EF FF C8 B1 FE D0 F8 60
:8D C8 C5 CC CC CF AC A0 D7 CF D2 CC C4 A1 8D 0
ldx #$60 ; rts instruction
stx $ff ; write rts to memory
jsr $00ff ; call it to figure out where this program is located
tsx ; get stack pointer
lda $0100 ; get hi address
sta $ff ; store hi address
dex ; move pointer down stack
lda $0100 ; get lo address
sta $fe ; store lo address
ldy #$18 ; offset to string
bne start ; branch always
loop
jsr $ffef ; echo character
iny ; next character
start
lda ($fe),y ; get character
bne loop ; keep looping if not a null character
rts
string
"\nHello, World!\n"
Nope, not 24 Tb, not 24Gb, not 24Mb, not 24Kb, but 24 bytes! The code is 12 bytes, and the data "HELLO WORLD\n" is 12 bytes.
org $0280
LDX #$0C ; 12 bytes, length of string
loop
LDA $028B,X ; get character from string
JSR $FFEF ; echo character
DEX ; next character
BNE loop ; last character?
RTS
string
"\nDLROW OLLEH" ; string reversed and high bit set