Python 3のUnicode文字列でアクセント（正規化）を除去する最良の方法は何ですか？

PYTHON3 チュートリアル

2024-09-16松本彩花Leave a Comment

Python 3のUnicode文字列でアクセント（正規化）を除去する最良の方法

Python 3でUnicode文字列のアクセントを除去する方法は、unicodedataモジュールを使用して正規化を行うことです。Unicode文字列の正規化は、文字列内の特定の文字を標準形に変換するプロセスです。以下に、最良の方法として3つのサンプルコードを示します。

サンプルコード1: NFD正規化を使用してアクセントを除去する方法

import unicodedata

def remove_accent(input_str):
    return ''.join(c for c in unicodedata.normalize('NFD', input_str) if unicodedata.category(c) != 'Mn')

input_str = "héllö"
output_str = remove_accent(input_str)
print(output_str)  # 出力: hello

サンプルコード2: NFKD正規化を使用してアクセントを除去する方法

import unicodedata

def remove_accent(input_str):
    return ''.join(c for c in unicodedata.normalize('NFKD', input_str) if not unicodedata.combining(c))

input_str = "café"
output_str = remove_accent(input_str)
print(output_str)  # 出力: cafe

サンプルコード3: 正規表現を使用してアクセントを除去する方法

import re

def remove_accent(input_str):
    return re.sub(r'\p{Mn}', '', unicodedata.normalize('NFKD', input_str))

input_str = "résuмé"
output_str = remove_accent(input_str)
print(output_str)  # 出力: resume

これらのサンプルコードを使用することで、Python 3のUnicode文字列からアクセントを除去することができます。正規化を行うことで、文字列をより扱いやすい形に変換することができます。

Python 3のUnicode文字列でアクセント（正規化）を除去する最良の方法は、unicodedataモジュールを使用してUnicode文字列を正規化することです。unicodedata.normalize()関数を使って、文字列をNFD（Normalization Form Decomposed）またはNFC（Normalization Form Composed）に正規化することができます。アクセントを除去したい場合は、NFDを使用することが一般的です。以下は、アクセントを除去するサンプルコードです。

import unicodedata

def remove_accent(input_str):
normalized_str = unicodedata.normalize(‘NFD’, input_str)
accent_removed = ”.join(c for c in normalized_str if not unicodedata.combining(c))
return accent_removed

input_str = “アクセントがついた文字列”
result = remove_accent(input_str)
print(result)

購読

Name*

Email*

Website

0 Comments

Inline Feedbacks

View all comments