xemacs и КОИ-8 буквы не входящие в 8859-1 (rus xemacs encoding example)
Ключевые слова: rus, xemacs, encoding, example, (найти похожие документы)
_ RU.UNIX (2:5077/15.22) ____________________________________________ RU.UNIX _
From : Boris Tobotras 2:5020/510 19 Dec 99 13:24:08
Subj : xemacs и КОИ-8 буквы не входящие в 8859-1
_______________________________________________________________________________
>>>>> "Serge" == Serge Matveev writes:
Serge> Да, если кто не понял, речь идет о русской "в". Она же, похоже,
Serge> влияет и на удаление предложений (M-k) - не всегда то, что я
Serge> хочу. Очень обломно, однако :-((
Да понял кто, понял. Русская "в" относится к числу немногих букв,
не являющихся буквами 8859-1 в кодировке КОИ-8 ;)
(defun case-table-aset (ct x y)
(if (listp ct) (setq ct (car ct)))
(aset ct x y)
)
(defun rus-syntax-table ()
"Set syntax and case tables for the current buffer according to encoding
of russian letters in the buffer. The encoding must be in variable
RUS-BUFFER-ENCODING."
(let* ((e (rus-encoding rus-buffer-encoding))
(ct (rus-copy-case-table (current-case-table)))
(st (copy-syntax-table (syntax-table)))
(lc-chars (substring e 0 (/ (length e) 2)))
(uc-chars (substring e (/ (length e) 2))))
(mapcar (function (lambda (x) (modify-syntax-entry x "w" st))) e)
(mapcar* (function (lambda (x y) (case-table-aset ct x y)))
lc-chars lc-chars)
(mapcar* (function (lambda (x y) (case-table-aset ct x y)))
uc-chars lc-chars)
(set-syntax-table st)
(set-case-table ct))
)
;;;; Various encodings of russian letters.
;;;; Each encoding definition is a sequence of codes(numbers) of
;;;; small letters in alphabet order and then capital letters
;;;; in alphabet order.
(defconst recognition-level 10
"How much pairs of russian letters from FREQUENT_PAIRS (in %) must be
in a text to recognize the text as russian (in corresponding encoding).")
(defvar max-length-of-text-to-analyze 5000
"How many letters RUS-GUESS-BUFFER-ENCODING should analyze.")
(defun rus-guess-buffer-encoding ()
"Analyze current buffer and if it contains russian text return the name of
of the text encoding."
(let ((i 0) c (prev -1) (freqs (make-vector 128 nil)) (count 0) encoding
(lim (if (> (- (point-max) (point-min)) max-length-of-text-to-analyze)
(+ (point-min) max-length-of-text-to-analyze) (point-max))))
;; Make empty table.
(while (< i 128)
(aset freqs i (make-vector 128 0))
(setq i (1+ i)))
;; Scan current buffer, calculate frequencies of char pairs
;; and store them to the table.
(setq i (point-min))
(while (< i lim)
(setq c (- (char-after i) 128))
(if (and (>= c 0) (<= c 127))
(progn
(setq count (1+ count))
(if (and (>= prev 0) (<= prev 127))
(aset (aref freqs prev) c (1+ (aref (aref freqs prev) c))))))
(setq prev c)
(setq i (1+ i)))
;; Detect encoding.
(some (function (lambda (ename)
(let* ((e (rus-encoding ename))
(sum (reduce (function
(lambda (s p)
(+ s
(aref (aref freqs
(- (aref e (car p))
128))
(- (aref e (cdr p))
128)))))
frequent_pairs :initial-value 0)))
(if (and (> sum 0)
(> (/ (* sum 100) count) recognition-level))
ename
nil))))
(mapcar 'car rus-encodings-alist)))
)
(provide 'rus-encodings)
--
Best regards, -- Boris.
Some people are only alive because it is illegal to kill them.
--- Gnus v5.5/XEmacs 20.3 - "London"
* Origin: Linux inside (2:5020/510@fidonet)
980 Прочтений • [xemacs и КОИ-8 буквы не входящие в 8859-1 (rus xemacs encoding example)] [08.05.2012] [Комментариев: 0]