Lua UTF-8 中文字符串长度计算

2025年11月2日 lzxz1234 Leave a comment

文章目录[隐藏]

UTF-8编码简介
代码实现

UTF-8编码简介

首先 UTF-8是一种可变长度的Unicode编码方式，它使用1到4个字节来表示一个字符。这种设计使得UTF-8既能表示全球所有的字符，又能保持与ASCII编码的兼容性。基本编码规则如下：

单字节字符：0xxxxxxx（ASCII字符）
双字节字符：110xxxxx 10xxxxxx
三字节字符：1110xxxx 10xxxxxx 10xxxxxx
四字节字符：11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

代码实现

local function utf8_len(str)
    local fontSize = 20
    local lenInByte = #str
    local count = 0
    local i = 1
    while true do
        local curByte = string.byte(str, i)
        if i > lenInByte then
            break
        end
        local byteCount = 1
        if curByte > 0 and curByte < 128 then
            byteCount = 1
        elseif curByte>=128 and curByte<224 then
            byteCount = 2
        elseif curByte>=224 and curByte<240 then
            byteCount = 3
        elseif curByte>=240 and curByte<=247 then
            byteCount = 4
        else
            break
        end
        i = i + byteCount
        count = count + 1
    end
    return count
end

发表回复取消回复