mecab does not work on Ubuntu + apt? Then, source compile!
Table of Contents
Summary
- mecab(japanese tokenizer) does not work correctly when one installed it with apt on ubuntu
- the solution is to compile from source.
Ubutn+apt to install mecab?
1 2 3 |
sudo apt install mecab sudo apt install libmecab-dev sudo apt install mecab-ipadic-utf8 |
and then…. you'd like to know where dictionaries exist.
1 2 3 4 |
$ mecab-config --dicdir /usr/lib/x86_64-linux-gnu/mecab/dic $ ls /usr/lib/x86_64-linux-gnu/mecab/dic ls: cannot access '/usr/lib/x86_64-linux-gnu/mecab/dic': No such file or directory |
It does not exist!
So where is the dictionary?
so let's find it!
1 2 3 4 5 6 7 8 9 10 11 |
$ find / -name "mecab*" | less 省略 /usr/share/doc/mecab/mecab_8h.html /usr/share/doc/mecab-ipadic-utf8 /usr/share/doc/mecab-utils /usr/share/doc/mecab-ipadic /usr/share/doc/mecab-jumandic /usr/share/doc/mecab-jumandic-utf8 /var/lib/mecab 省略 |
okay… it exists under /user/share/doc. So, you look at the directory, however, you find that dictionary files do not exist again!
Solution? source compile!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# required libs sudo yum install -y bzip2 bzip2-devel gcc gcc-c++ git make wget curl openssl-devel readline-devel zlib-devel patch file # create working dir mkdir -p ~/source/mecab cd ~/source/mecab # download source wget 'https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE' -O mecab-0.996.tar.gz tar zxvf mecab-0.996.tar.gz cd mecab-0.996 # create target of install sudo mkdir -p /opt/mecab # configure and compile and install ./configure --prefix=/opt/mecab --with-charset=utf8 --enable-utf8-only make sudo make install # put env path echo "export PATH=/opt/mecab/bin:\$PATH" >> ~/.bashrc source ~/.bashrc mecab-config --libs-only-L | sudo tee /etc/ld.so.conf.d/mecab.conf sudo ldconfig # dictionary install mkdir ~/source/mecab-ipadic cd ~/source/mecab-ipadic wget 'https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM' -O mecab-ipadic-2.7.0-20070801.tar.gz tar zxvf mecab-ipadic-2.7.0-20070801.tar.gz cd mecab-ipadic-2.7.0-20070801 # compile and install ./configure --with-mecab-config=/opt/mecab/bin/mecab-config --with-charset=utf8 make sudo make install |
now it works!
For your info
A python package JapaneseTokenizer, which I developed, could enable following options.
If you could not refer mecab-config by ENV
1 |
mecab_wrapper = MecabWrapper(dictType='ipadic', path_mecab_config='/opt/mecab/bin') |
If you could not refer dictionary path by ENV
1 |
mecab_wrapper = MecabWrapper(dictType='ipadic', path_dictionary='[path to dictionary]') |
Discussion
New Comments
However, the ubuntu version I compiled from source does not work from time to time. I also asked in Stackoverflow but no one has a clue. I have some finding just now and I want to ask if any guys here know how to identify the problem.
Hi. Thanks for your comments. If possible, can you share the error message that you encountered? Or, you can share the Stackoverflow link 😉