近来群里看到有人谈起押韵机器,突然想起好多年前的回忆。
心血来潮写了一个押韵机器人。可以识别韵脚、比较韵脚、词汇列表按韵脚分类。
经测试,目前对多音字支持不好:比如唠嗑,唠叨。这种就识别会出错。欢迎大家继续测试,有问题反馈给我。
拼音识别基于pypinyin库实现,具体用法详见其github。
押韵机器人代码押韵机器人代码文件命名叫“punchliner.py”,代码如下:
from pypinyin import pinyin, lazy_pinyin, Style words = [,,,,,,,,,,,,,,,,,,,,,,,,] def is_alphabet(uchar): rule1 = (uchar >= uuchar<=u) rule2 = (uchar >= uuchar<=u) if rule1 or rule2: return True else: return False def get_punchline(word): last_character = word[len(word)-1] last_character_pinyin = pinyin(last_character)[0][0] punchline = [] for the_char in last_character_pinyin: if not is_alphabet(the_char): punchline.append(last_character_pinyin.split(the_char)[0]) punchline.append(the_char) punchline.append(last_character_pinyin.split(the_char)[1]) return punchline def compare_punchline(word1,word2): punchline1 = get_punchline(word1) punchline2 = get_punchline(word2) prefix1 = punchline1[0] prefix2 = punchline2[0] #前缀尾字母设定不为空 prefix1_last_char = prefix2_last_char = prefix1 != '': prefix1_last_char = prefix1[len(prefix1)-1] if prefix2 != '': prefix2_last_char = prefix2[len(prefix2)-1] #前缀先决条件,都是i或都不是i才算押韵 pre_rule1 = (prefix1_last_char == ) pre_rule2 = (prefix2_last_char == ) all_i = (pre_rule1 and pre_rule2) all_not_i = [prefix1_last_char,prefix2_last_char] if all_i or all_not_i: rule1 = punchline1[1] == punchline2[1] rule2 = punchline1[2] == punchline2[2] if rule1 and rule2: return True else: return False else: return False def classify_punchline(words_list): target = words_list[0] yayun_words = filter(lambda word:compare_punchline(target,word)==True,words) yayun_words_list = list(set(yayun_words)) left_words_list = list(set(words_list)-set(yayun_words_list)) print(yayun_words_list) rule1 = left_words_list != words_list rule2 = len(left_words_list) > 0 if rule1 and rule2: classify_punchline(left_words_list) : classify_punchline(words)
其中:
1. 函数fuck_yayun可以对词藻列表中的词汇进行判断,把押韵的词汇进行自动归类;
2. 函数get_punchline可以获取词汇韵脚;
3. 函数compare_punchline可以比较韵脚。
希望有朝一日可以像发明AlphaGo一样发明AlphaRapper,让他去参加中国有嘻哈。
运行结果: