Notice
Recent Posts
Recent Comments
Link
| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- 파이썬 크롤링
- python
- Naive Bayes
- 사이킷런
- Data pre-processing
- ML
- Python crawler
- 파이썬 크롤러
- 나이브베이즈
- 순회 크롤러
- 배열
- 파이썬 객체 지향 프로그래밍
- scikit-learn
- K평균군집화
- 타이타닉 데이터
- NumPy
- control statement
- pandas
- 넘파이
- Titanic data set
- 파이썬
- 판다스
- 제어문
- Machine Learning
- sklearn
- 머신러닝
- python control statement
- dataframe
- KMeans Clustering
- 파이썬 제어문
Archives
- Today
- Total
Try to 개발자 EthanJ의 성장 로그
scikit-learn Machine Learning Naive Bayes Kor NLP 사이킷런 머신러닝 나이브베이즈 한글 자연어처리 본문
CS & DS/scikit-learn Machine Learning
scikit-learn Machine Learning Naive Bayes Kor NLP 사이킷런 머신러닝 나이브베이즈 한글 자연어처리
EthanJ 2022. 11. 21. 14:44
scikit-learn Machine Learning Naive Bayes Kor NLP
사이킷런 머신러닝 나이브베이즈 한글 자연어처리
In [1]:
# import library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
1. 데이터 불러오기 (Data Collection)¶
In [2]:
# https://github.com/e9t/nsmc/
file_url = 'https://raw.githubusercontent.com/dev-EthanJ/scikit-learn_Machine_Learning/main/data/ratings.txt'
df = pd.read_csv(file_url, sep='\t', index_col=0)
df.head()
Out[2]:
| document | label | |
|---|---|---|
| id | ||
| 8112052 | 어릴때보고 지금다시봐도 재밌어요ㅋㅋ | 1 |
| 8132799 | 디자인을 배우는 학생으로, 외국디자이너와 그들이 일군 전통을 통해 발전해가는 문화산... | 1 |
| 4655635 | 폴리스스토리 시리즈는 1부터 뉴까지 버릴께 하나도 없음.. 최고. | 1 |
| 9251303 | 와.. 연기가 진짜 개쩔구나.. 지루할거라고 생각했는데 몰입해서 봤다.. 그래 이런... | 1 |
| 10067386 | 안개 자욱한 밤하늘에 떠 있는 초승달 같은 영화. | 1 |
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 200000 entries, 8112052 to 8548411 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 document 199992 non-null object 1 label 200000 non-null int64 dtypes: int64(1), object(1) memory usage: 4.6+ MB
In [4]:
df[df.document.isnull()]
Out[4]:
| document | label | |
|---|---|---|
| id | ||
| 6369843 | NaN | 1 |
| 511097 | NaN | 1 |
| 2172111 | NaN | 1 |
| 402110 | NaN | 1 |
| 5942978 | NaN | 0 |
| 5026896 | NaN | 0 |
| 1034280 | NaN | 0 |
| 1034283 | NaN | 0 |
- 총 데이터 갯수에 비해 적은 missing value → drop
In [5]:
df = df.dropna()
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 199992 entries, 8112052 to 8548411 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 document 199992 non-null object 1 label 199992 non-null int64 dtypes: int64(1), object(1) memory usage: 4.6+ MB
In [6]:
sample = pd.concat([df.head(1000), df.tail(1000)])
sample.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2000 entries, 8112052 to 8548411 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 document 2000 non-null object 1 label 2000 non-null int64 dtypes: int64(1), object(1) memory usage: 46.9+ KB
In [7]:
sample.head(10)
Out[7]:
| document | label | |
|---|---|---|
| id | ||
| 8112052 | 어릴때보고 지금다시봐도 재밌어요ㅋㅋ | 1 |
| 8132799 | 디자인을 배우는 학생으로, 외국디자이너와 그들이 일군 전통을 통해 발전해가는 문화산... | 1 |
| 4655635 | 폴리스스토리 시리즈는 1부터 뉴까지 버릴께 하나도 없음.. 최고. | 1 |
| 9251303 | 와.. 연기가 진짜 개쩔구나.. 지루할거라고 생각했는데 몰입해서 봤다.. 그래 이런... | 1 |
| 10067386 | 안개 자욱한 밤하늘에 떠 있는 초승달 같은 영화. | 1 |
| 2190435 | 사랑을 해본사람이라면 처음부터 끝까지 웃을수 있는영화 | 1 |
| 9279041 | 완전 감동입니다 다시봐도 감동 | 1 |
| 7865729 | 개들의 전쟁2 나오나요? 나오면 1빠로 보고 싶음 | 1 |
| 7477618 | 굿 | 1 |
| 9250537 | 바보가 아니라 병 쉰 인듯 | 1 |
2. 데이터 전처리 (Data pre-processing)¶
In [8]:
sample_text = sample.document.iloc[0]
sample_text
Out[8]:
'어릴때보고 지금다시봐도 재밌어요ㅋㅋ'
In [9]:
# https://konlpy.org/ko/latest/index.html
!pip install konlpy --quiet
# 단어 품사, 형태소 태깅 가능
In [10]:
from konlpy.tag import Okt
okt = Okt()
print(sample_text)
# 명사(noun)만 남기기
print(okt.nouns(sample_text))
어릴때보고 지금다시봐도 재밌어요ㅋㅋ ['때', '보고', '지금', '다시']
In [11]:
# 두 글자 이상의 단어
sample['nouns'] = sample.document.apply(okt.nouns).apply(
lambda nouns: [n for n in nouns if len(n) >= 2])
sample['nouns']
Out[11]:
id
8112052 [보고, 지금, 다시]
8132799 [디자인, 학생, 외국, 디자이너, 일군, 전통, 통해, 발전, 문화, 산업, 사실...
4655635 [폴리스스토리, 시리즈, 부터, 하나, 최고]
9251303 [연기, 진짜, 생각, 몰입, 진짜, 영화]
10067386 [안개, 밤하늘, 초승달, 영화]
...
8963373 [포켓, 몬스터]
3302770 []
5458175 [완전, 사이코, 영화, 마지막, 더욱더, 영화, 린다]
6908648 [라따뚜이, 스머프, 런가]
8548411 [저그, 영차, 영차, 영차]
Name: nouns, Length: 2000, dtype: object
In [12]:
sample.head()
Out[12]:
| document | label | nouns | |
|---|---|---|---|
| id | |||
| 8112052 | 어릴때보고 지금다시봐도 재밌어요ㅋㅋ | 1 | [보고, 지금, 다시] |
| 8132799 | 디자인을 배우는 학생으로, 외국디자이너와 그들이 일군 전통을 통해 발전해가는 문화산... | 1 | [디자인, 학생, 외국, 디자이너, 일군, 전통, 통해, 발전, 문화, 산업, 사실... |
| 4655635 | 폴리스스토리 시리즈는 1부터 뉴까지 버릴께 하나도 없음.. 최고. | 1 | [폴리스스토리, 시리즈, 부터, 하나, 최고] |
| 9251303 | 와.. 연기가 진짜 개쩔구나.. 지루할거라고 생각했는데 몰입해서 봤다.. 그래 이런... | 1 | [연기, 진짜, 생각, 몰입, 진짜, 영화] |
| 10067386 | 안개 자욱한 밤하늘에 떠 있는 초승달 같은 영화. | 1 | [안개, 밤하늘, 초승달, 영화] |
In [13]:
df = sample.copy()
3. 모델 학습 (Training Model)¶
In [14]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
joined_nouns = df.nouns.apply(" ".join)
cv.fit(joined_nouns)
cv.vocabulary_
Out[14]:
{'보고': 1206,
'지금': 2670,
'다시': 551,
'디자인': 722,
'학생': 3120,
'외국': 2059,
'디자이너': 721,
'일군': 2309,
'전통': 2479,
'통해': 2972,
'발전': 1119,
'문화': 1011,
'산업': 1363,
'사실': 1345,
'우리나라': 2090,
'시절': 1640,
'열정': 1964,
'노라노': 494,
'사람': 1333,
'폴리스스토리': 3067,
'시리즈': 1628,
'부터': 1266,
'하나': 3098,
'최고': 2814,
'연기': 1945,
'진짜': 2711,
'생각': 1404,
'몰입': 964,
'영화': 1977,
'안개': 1770,
'밤하늘': 1124,
'초승달': 2806,
'사랑': 1336,
'라면': 747,
'처음': 2772,
'완전': 2044,
'감동': 52,
'전쟁': 2475,
'바보': 1067,
'나이': 430,
'훗날': 3280,
'사하나': 1360,
'감정': 62,
'이해': 2273,
'고질': 199,
'오페라': 2024,
'작품': 2381,
'극단': 323,
'갈림': 42,
'반전': 1105,
'평점': 3046,
'긴장감': 374,
'스릴': 1574,
'전장': 2474,
'공포': 219,
'고시': 189,
'소재': 1512,
'관련': 236,
'단연': 565,
'가면': 10,
'갈수록': 44,
'더욱': 624,
'밀회': 1061,
'화이팅': 3243,
'수작': 1549,
'일본': 2315,
'마음': 840,
'임팩트': 2342,
'일품': 2334,
'제대로': 2541,
'범죄': 1174,
'스릴러': 1575,
'마디': 827,
'징텅': 2728,
'교복': 249,
'이의': 2256,
'볼펜': 1241,
'자국': 2350,
'역시': 1939,
'미처': 1044,
'전하': 2485,
'형태': 3210,
'마지막': 847,
'강압': 78,
'용서': 2082,
'세뇌': 1476,
'대한': 615,
'비판': 1309,
'중세시대': 2655,
'명작': 933,
'영상': 1971,
'존재': 2588,
'한번': 3129,
'제니퍼': 2540,
'코넬': 2888,
'아역시절': 1745,
'로버트': 790,
'드니': 704,
'장면': 2403,
'정말': 2511,
'가슴속': 16,
'기억': 365,
'수가': 1534,
'인간': 2278,
'잠재': 2390,
'악마': 1765,
'성은': 1460,
'여러': 1915,
'시간': 1617,
'공간': 206,
'존속': 2587,
'다큐': 557,
'그것': 300,
'재현': 2441,
'최고다': 2815,
'삼일': 1374,
'동안': 683,
'틈틈이': 2995,
'잠도': 2389,
'여운': 1925,
'실화': 1685,
'충격': 2845,
'어디': 1842,
'일어나서': 2324,
'각심': 35,
'그라샴': 313,
'농아인': 513,
'이정재': 2259,
'이범수': 2221,
'친구': 2857,
'우정': 2102,
'매우': 891,
'굿굿굿': 282,
'또해': 735,
'제발': 2547,
'제이크': 2552,
'질렌할': 2717,
'대체': 610,
'입가': 2343,
'미소': 1037,
'샤방샤방했던': 1415,
'원표': 2133,
'조연': 2571,
'이양': 2246,
'마치': 849,
'바다': 1064,
'아쿠아리움': 1762,
'느낌': 533,
'자녀': 2354,
'강추': 81,
'정의': 2529,
'콜트': 2902,
'콜텍': 2901,
'노동자': 493,
'이야기': 2245,
'화보': 3241,
'브라질': 1288,
'남자': 457,
'내내': 468,
'여배우': 1920,
'도법': 652,
'멤버': 921,
'모두': 943,
'기대': 350,
'액션': 1829,
'런가': 766,
'흥미진진': 3294,
'워낙': 2119,
'격투씬': 151,
'그냥': 303,
'스마트': 1577,
'티비': 2996,
'인지도': 2302,
'암살': 1811,
'하여튼': 3110,
'인정': 2298,
'파르': 3001,
'북한': 1267,
'목숨': 961,
'대한민국': 616,
'그거': 299,
'납득': 460,
'나불': 425,
'거려': 112,
'종북': 2600,
'박평': 1086,
'그대': 307,
'시작': 1638,
'얼굴': 1873,
'주인공': 2627,
'매력': 888,
'이구': 2204,
'용이': 2083,
'요즘': 2072,
'아이돌': 1750,
'배우': 1153,
'해먹': 3162,
'정도': 2509,
'정말재밋': 2513,
'봣는대': 1246,
'탱고': 2953,
'음악': 2186,
'평생': 3042,
'후회': 3276,
'드라마': 705,
'미도': 1031,
'캐릭터': 2868,
'여러가지': 1916,
'결말': 156,
'이제': 2261,
'재미': 2425,
'순위': 1561,
'신동엽': 1655,
'순간': 1556,
'캐치': 2872,
'감탄': 64,
'이영자': 2247,
'대박': 599,
'이동욱': 2212,
'인생': 2290,
'전작': 2473,
'쓰레기': 1718,
'한국영': 3125,
'별로': 1201,
'다소': 549,
'당시': 585,
'눈물': 523,
'사발': 1342,
'신부': 1658,
'로서': 794,
'명감': 926,
'실천': 1682,
'제리': 2543,
'자신': 2366,
'부정': 1263,
'로키': 797,
'원주율': 2132,
'메이커': 915,
'추석': 2835,
'특선영화': 2991,
'가족': 24,
'끼리': 416,
'보기': 1208,
'우리': 2089,
'애기': 1819,
'선택': 1437,
'괜찬': 245,
'조합': 2582,
'바로': 1066,
'대의': 607,
'어머니': 1849,
'원작': 2127,
'드래곤볼': 708,
'에볼루션': 1892,
'만듬': 862,
'예능': 1992,
'방학': 1138,
'아침': 1761,
'채널': 2766,
'엔트랩먼트': 1903,
'권력': 287,
'의리': 2193,
'역사': 1937,
'초롱': 2801,
'팬심': 3020,
'내용': 475,
'배경음악': 1142,
'달달': 570,
'중간': 2645,
'본방': 1232,
'사수': 1344,
'스토리': 1597,
'요새': 2068,
'수백향': 1547,
'제목': 2545,
'화질': 3247,
'가을로': 21,
'가을': 20,
'때문': 730,
'이건': 2201,
'벌써': 1172,
'퀄리티': 2906,
'일단': 2310,
'까지봣다': 402,
'장국영': 2394,
'자살': 2362,
'극적': 327,
'뀰잼': 414,
'예전': 1999,
'양심': 1837,
'냉장고': 479,
'움찔': 2112,
'한가지': 3121,
'부탁': 1265,
'연기자': 1948,
'보호': 1226,
'폭력': 3060,
'걱정': 124,
'성추행': 1467,
'상가': 1377,
'모든': 945,
'목격': 958,
'증언': 2665,
'덕분': 627,
'청소년': 2787,
'계속': 172,
'경찰서': 169,
'멋있쪙': 905,
'재밋었다': 2433,
'세기': 1475,
'비디오': 1301,
'남기남': 450,
'감독': 49,
'이름': 2215,
'성룡': 1456,
'형님': 3205,
'마이': 841,
'우상': 2093,
'당신': 586,
'장르': 2401,
'이영화': 2248,
'리타': 814,
'자꾸': 2353,
'죄책감': 2609,
'도잠': 658,
'커서': 2875,
'웃음': 2116,
'정치': 2533,
'묘사': 974,
'표현': 3072,
'흥행': 3296,
'안나': 1773,
'절대': 2488,
'디테': 725,
'일만': 2314,
'봉임': 1244,
'한마디': 3128,
'달기': 569,
'수록': 1544,
'살이': 1367,
'엇냐': 1882,
'성격': 1449,
'만점': 867,
'다음': 553,
'초반': 2802,
'설정': 1447,
'점차': 2505,
'판타지': 3011,
'미래': 1033,
'현실': 3199,
'언론': 1871,
'탄압': 2941,
'은유': 2181,
'오카다': 2020,
'예상': 1994,
'의외': 2199,
'인도영화': 2285,
'무엇': 996,
'다그': 545,
'허정무': 3183,
'대신': 605,
'장외룡': 2414,
'한국': 3123,
'고고싱': 174,
'어찌': 1863,
'보조개': 1222,
'메이': 913,
'최고봉': 2816,
'초딩': 2800,
'봣음': 1250,
'비포미드나잇': 1311,
'심정': 1694,
'또한': 734,
'한편': 3145,
'모습': 953,
'절로': 2490,
'뭔가': 1019,
'스케': 1584,
'라디오': 745,
'임진강': 2340,
'한석규': 3131,
'쵝오': 2828,
'허니': 3179,
'꿀잼': 412,
'개꿀잼': 88,
'한예슬': 3134,
'원래': 2122,
'헤어': 3192,
'메이크업': 916,
'훈남': 3277,
'쉐프': 1565,
'보구': 1207,
'요리': 2067,
'시나리오': 1620,
'비고': 1296,
'잼잼꿀잼': 2444,
'잼핵잼잼잼': 2445,
'개인': 103,
'언제': 1872,
'팬텀': 3021,
'크리스틴': 2910,
'일이': 2326,
'전혀': 2486,
'처럼': 2770,
'남편': 459,
'아쉬움': 1739,
'알파': 1809,
'치노': 2850,
'명화': 937,
'뮤지컬': 1021,
'역대': 1936,
'대부': 600,
'바람': 1065,
'레벨': 768,
'명의': 932,
'마잭': 845,
'마돈나': 826,
'엘비스급': 1906,
'노래실력': 496,
'바비': 1068,
'바스코': 1069,
'이번': 2220,
'결과': 154,
'대해': 617,
'불만': 1281,
'어쩌면': 1860,
'거지': 121,
'어쩌': 1858,
'자격': 2349,
'코미디': 2891,
'사극': 1329,
'장혁': 2420,
'명품': 935,
'이서': 2234,
'마련': 828,
'사마': 1338,
'그날': 302,
'저녁': 2449,
'기분': 360,
'옛날': 2001,
'워낭소리': 2120,
'더빙': 623,
'자막': 2358,
'연말': 1952,
'추천': 2839,
'사회': 1361,
'치부': 2854,
'예언': 1997,
'엣날': 1910,
'제일': 2553,
'슈퍼': 1570,
'울트라': 2110,
'노막': 500,
'중동': 2652,
'참여': 2758,
'얼마나': 1875,
'아버지': 1735,
'이자': 2257,
'무사': 991,
'일생': 2323,
'의미': 2195,
'죽음': 2636,
'일지': 2330,
'신념': 1653,
'진일': 2709,
'물결': 1014,
'쵝오임': 2829,
'다만': 547,
'엔딩': 1899,
'배심원': 1151,
'상대로': 1381,
'일장': 2327,
'연설': 1955,
'작위': 2378,
'신파': 1669,
'무죄': 1001,
'애정': 1826,
'멜로': 920,
'최영장군': 2824,
'이민호': 2218,
'안치환': 1797,
'방향': 1140,
'노래': 495,
'걸작': 130,
'안보': 1784,
'고도': 177,
'오스트레일리아': 2011,
'조금': 2559,
'프랑스': 3080,
'심장': 1693,
'쫄깃하': 2739,
'메이드': 914,
'범죄물': 1175,
'류승범': 801,
'황정민': 3259,
'기쁨': 361,
'등장인물': 717,
'절제': 2496,
'소설가': 1506,
'무척': 1003,
'엄마': 1877,
'운전': 2108,
'맥거핀': 893,
'아이맥스': 1753,
'개봉': 97,
'거임': 119,
'완죤': 2046,
'스타': 1589,
'정부': 2517,
'국민': 273,
'의지': 2200,
'퀘벡': 2907,
'호킹': 3218,
'능욕': 540,
'보통': 1224,
'전쟁영화': 2476,
'특유': 2992,
'연출': 1961,
'이정현': 2260,
'전형': 2487,
'사이코': 1353,
'우릴': 2091,
'욕망': 2076,
'명대사': 928,
'똥칠': 738,
'된거': 692,
'점도': 2499,
'블레이드러너': 1293,
'버금': 1166,
'결코': 160,
'정일우': 2530,
'상투': 1394,
'존경': 2583,
'반지': 1107,
'제왕': 2550,
'피터': 3091,
'잭슨': 2442,
'감독판': 50,
'추억': 2837,
'녹화': 507,
'리메이크': 807,
'무협': 1006,
'보너스': 1209,
'시내': 1622,
'보신': 1216,
'추강': 2831,
'아이': 1749,
'만화영화': 875,
'액션영화': 1831,
'서도': 1417,
'귀수': 292,
'아군': 1720,
'첫사랑': 2784,
'설레임': 1440,
'그대로': 308,
'관람': 235,
'향수': 3177,
'김민선': 387,
'김규리': 382,
'여신': 1922,
'미모': 1035,
'힐링': 3305,
'무비': 990,
'자리': 2356,
'연속': 1957,
'질리': 2718,
'외계인': 2058,
'신분': 1659,
'라이언': 750,
'애니': 1820,
'코드': 2889,
'반영': 1100,
'짜임새': 2730,
'러브': 761,
'모드': 944,
'여주': 1931,
'어요': 1851,
'하나요': 3102,
'시간여행': 1618,
'트릴': 2984,
'방송': 1132,
'털털': 2956,
'실제': 1680,
'시기': 1619,
'나중': 433,
'입문': 2345,
'가연': 17,
'물감': 1012,
'애가': 1817,
'주제': 2631,
'배급사': 1144,
'싸이코패스': 1707,
'안인숙': 1792,
'이편': 2271,
'조각': 2558,
'재밋': 2426,
'용도': 2081,
'케미': 2880,
'이예': 2249,
'여유': 1926,
'일요일': 2325,
'오후': 2028,
'동화': 690,
'세계': 1473,
'스타워즈': 1591,
'쌍탑': 1710,
'일견': 2308,
'알콜중독': 1808,
'의문': 2194,
'겉보기': 135,
'안정': 1794,
'그녀': 304,
'외로움': 2061,
'아픔': 1763,
'로써': 795,
'이유': 2253,
'내생': 473,
'가장': 22,
'여명': 1919,
'눈동자': 522,
'허준': 3184,
'모래시계': 946,
'황금의제국': 3257,
'규모': 296,
'사업가': 1346,
'집안': 2724,
'갈등': 41,
'서로': 1418,
'원수': 2124,
'필름': 3094,
'무당': 980,
'방이': 1136,
'들개': 711,
'해외': 3166,
'수니': 1539,
'웁니': 2113,
'가원': 19,
'웃기': 2114,
'해지': 3167,
'소녀시대': 1492,
'윤아': 2175,
'잘못': 2387,
'박한별': 1087,
'송지효': 1532,
'조안': 2569,
'싸이코': 1706,
'역할': 1941,
'나름': 422,
'몸매': 966,
'부분': 1259,
'주연': 2624,
'목소리': 960,
'간직': 40,
'사람과': 1334,
'리가': 802,
'폭풍': 3064,
'집중': 2726,
'도리': 647,
'글쎄': 335,
'알라': 1801,
'미움': 1042,
'가르침': 9,
'모슬렘': 952,
'신고': 1651,
'애국': 1818,
'미국': 1025,
'시민': 1629,
'말로': 877,
'교회': 251,
'반성': 1099,
'기독교인': 353,
'래야': 758,
'나마': 423,
'행운': 3174,
'엔딩크레딧': 1901,
'발견': 1110,
'목록': 959,
'배경': 1141,
'휴식': 3286,
'난리': 441,
'람보': 755,
'터미네이터': 2954,
'맥클레인': 894,
'얘기': 1840,
'재밋다': 2431,
'혼자': 3225,
'올해': 2036,
'짱짱': 2734,
'눈빛': 524,
'생동감': 1405,
'몸짓': 968,
'발짓': 1120,
'거의': 118,
'강아지': 77,
'토토': 2963,
'흐름': 3288,
'희망': 3297,
'슬픔': 1614,
'비극': 1298,
'동전': 688,
'양면': 1836,
'어릴떄': 1848,
'아빠': 1736,
'매일': 892,
'지오': 2689,
'자연': 2367,
'압도': 1812,
'댄스': 619,
'롤라': 798,
'새해': 1399,
'아따맘마': 1726,
'리지': 813,
'차태현': 2751,
'박중훈': 1085,
'아즈': 1759,
'환상': 3251,
'헤메': 3191,
'막시무스': 857,
'빈민촌': 1315,
'실상': 1677,
'리얼': 810,
'영화로': 1981,
'미화': 1050,
'생명': 1408,
'그름': 318,
'나라': 419,
'스스로': 1579,
'정해': 2536,
'그게': 301,
'앞뒤': 1816,
'가까이': 1,
'오만': 2006,
'편견': 3031,
'위트': 2145,
'영화매니아': 1982,
'지침': 2693,
'다른': 546,
'미드': 1032,
'재방송': 2435,
'캐스팅': 2871,
'한효주': 3146,
'문채원': 1010,
'방금': 1127,
'중학교': 2658,
'담임': 576,
'선생님': 1435,
'이후': 2276,
'가끔': 2,
'백영규': 1158,
'재난': 2422,
'극치': 329,
'돋넼': 668,
'대부분': 601,
'완성': 2042,
'코믹': 2892,
'요소': 2069,
'아치': 1760,
'애환': 1828,
'조음': 2573,
'동물': 675,
'재밋는': 2429,
'박유천': 1081,
'한지민': 3141,
'미가': 1024,
'폭포': 3063,
'탈출': 2942,
'이전': 2258,
'사상': 1343,
'무렵': 983,
'최진희': 2825,
'로랑': 783,
'건가': 125,
'누가': 517,
'중국영화': 2648,
'로맨스': 786,
'공존': 214,
'번은': 1171,
'감히': 66,
'프레이져': 3083,
'주걸륜': 2612,
'머이쪙': 900,
'박력': 1073,
'스턴': 1596,
'평작': 3045,
'하정우': 3115,
'개그': 86,
'단발': 563,
'의사': 2196,
'적당': 2456,
'밋게봣': 1062,
'봣던': 1248,
'사관': 1328,
'키스신': 2920,
'예수님': 1995,
'계기': 171,
'모던': 941,
'시네마': 1623,
'주의': 2625,
'로맨틱': 787,
'조우': 2572,
'키아로스타미': 2921,
'이중': 2265,
'운동': 2106,
'믿음': 1058,
'가슴': 15,
'정태우': 2534,
'감사용': 55,
'패전': 3017,
'처리': 2771,
'투수': 2975,
'전문': 2466,
'패가': 3014,
'이기': 2207,
'무승부': 993,
'등판': 718,
'본인': 1234,
'상대': 1380,
'방어율': 1135,
'전체': 2478,
'옴니버스': 2037,
'진행': 2715,
'색감': 1400,
'상상력': 1384,
'킬림': 2923,
'임용': 2339,
'낫다': 461,
'연애': 1958,
'여자': 1928,
'관점': 239,
'표정': 3070,
'대사': 602,
'콘서트': 2899,
'현장': 3202,
'전달': 2463,
'명동역': 929,
'젊은이': 2497,
'헌신': 3186,
'전범': 2468,
'체코': 2792,
'운명': 2107,
'해리슨': 3161,
'포드': 3050,
'홍콩영화': 3235,
'화가': 3236,
'몽환': 972,
'갑자기': 67,
'다해': 558,
'경의': 166,
'우울함': 2099,
'만큼': 870,
'실감': 1671,
'연민': 1953,
'벤자민': 1189,
'검색': 132,
'종로': 2597,
'장거리': 2393,
'세트': 1485,
'장만': 2402,
'요크셔': 2074,
'테리어': 2958,
'소형견': 1515,
'상태': 1393,
'요키': 2075,
'견주': 153,
'전적': 2477,
'잔인': 2384,
'나안': 428,
'도시': 653,
'세상': 1480,
'세심': 1481,
'완벽주의자': 2041,
'피에르': 3090,
'주네': 2616,
'이윤기': 2254,
'한장': 3136,
'면도': 922,
'빵빵': 1320,
'지고': 2668,
'재밋는데': 2430,
'적꿈': 2455,
'일리': 2313,
'므흣': 1023,
'메각하킬': 908,
'스타스크림': 1590,
'활약': 3253,
'이참': 2266,
'트포': 2987,
'오토봇': 2023,
'로봇': 791,
'대장': 609,
'제외': 2551,
'기술': 364,
'여친': 1933,
'주말': 2618,
'가요': 18,
'흐흐': 3290,
'게이고': 141,
'플롯': 3087,
'전개': 2458,
'화면': 3240,
'구성': 259,
'예술': 1996,
'판도': 3008,
'뉴문': 528,
'신도': 1654,
'벨라': 1190,
'트와일라잇': 2986,
'이클립스': 2268,
'커플': 2877,
'털보': 2955,
'뒷모습': 700,
'레이드': 774,
'맨몸': 895,
'결정': 159,
'체다': 2789,
'편도': 3032,
'소름': 1498,
'올드보이': 2034,
'합작': 3155,
'진품': 2714,
'예나': 1991,
'노릇': 498,
'아들': 1725,
'뮤직비디오': 1022,
'동양인': 685,
'하자': 3114,
'자체': 2372,
'디스': 719,
'통한': 2971,
'유지': 2164,
'남녀': 451,
'유태인': 2167,
'레즈비언': 778,
'오타쿠': 2021,
'등등': 715,
'목적': 962,
'소설': 1505,
'강간범': 69,
'새끼': 1397,
'강간': 68,
'정신': 2523,
'난도질': 440,
'드래곤길들이기': 707,
'감명': 53,
'봣어': 1249,
'싱하형': 1702,
'변호인': 1198,
'법정': 1176,
'분통': 1274,
'지경': 2667,
'있냠': 2348,
'삼류': 1371,
'타이틀': 2932,
'덱스터': 636,
'데스몬드': 633,
'리즈시절': 812,
'인형': 2306,
'색히': 1403,
'영화음악': 1984,
'라인': 751,
'난캉이': 446,
'필리핀': 3095,
'서클': 1427,
'배웅': 1155,
'그때': 311,
'풍경': 3075,
'과거': 222,
'회상': 3261,
'요코': 2073,
'고양이': 190,
'재난영화': 2423,
'아주': 1757,
'흥미': 3293,
'똥개': 736,
'잼잇엇어': 2443,
'에피소드': 1895,
'사이코패스': 1354,
'초점': 2808,
'탄도': 2939,
'크리스마스': 2908,
'상영': 1388,
'겨우': 144,
'연기력': 1946,
'최상': 2820,
'몰입도': 965,
'웅앙아': 2117,
'신은경': 1663,
'피아니스트': 3089,
'보시': 1215,
'니요': 541,
'시즌': 1641,
'다가': 544,
'어쩌다가': 1859,
'특징': 2994,
'거짓말': 122,
'태고': 2946,
'살때': 1365,
'십대영화': 1698,
'울면': 2109,
'사육사': 1350,
...}
In [15]:
x = cv.transform(joined_nouns)
print(x)
(0, 551) 1 (0, 1206) 1 (0, 2670) 1 (1, 494) 1 (1, 721) 1 (1, 722) 1 (1, 1011) 1 (1, 1119) 1 (1, 1333) 1 (1, 1345) 1 (1, 1363) 1 (1, 1640) 1 (1, 1964) 1 (1, 2059) 1 (1, 2090) 1 (1, 2309) 1 (1, 2479) 2 (1, 2972) 1 (1, 3120) 1 (2, 1266) 1 (2, 1628) 1 (2, 2814) 1 (2, 3067) 1 (2, 3098) 1 (3, 964) 1 : : (1990, 2499) 1 (1990, 2892) 1 (1990, 3046) 1 (1991, 67) 1 (1991, 435) 1 (1991, 1957) 1 (1991, 1977) 1 (1991, 2347) 1 (1992, 2947) 1 (1993, 1990) 1 (1994, 102) 1 (1994, 1202) 1 (1995, 963) 1 (1995, 3057) 1 (1997, 625) 1 (1997, 819) 1 (1997, 847) 1 (1997, 1353) 1 (1997, 1977) 2 (1997, 2044) 1 (1998, 746) 1 (1998, 766) 1 (1998, 1578) 1 (1999, 1976) 3 (1999, 2447) 1
In [16]:
from sklearn.model_selection import train_test_split
y = df.label
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size = 0.2, random_state = 814)
In [17]:
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(x_train, y_train)
pred = model.predict(x_test)
4. 모델 평가 (Evaluating Model)¶
In [18]:
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(y_test, pred)
Out[18]:
0.69
In [19]:
cf_mtx = confusion_matrix(y_test, pred)
cf_mtx
Out[19]:
array([[123, 78],
[ 46, 153]])
In [20]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(cf_mtx, cmap='coolwarm', annot=True, fmt='.0f')
plt.title("CONFUSION MATRIX")
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
Mecab¶
In [21]:
!curl -s https://raw.githubusercontent.com/teddylee777/machine-learning/master/99-Misc/01-Colab/mecab-colab.sh | bash
--2022-11-16 05:30:26-- https://www.dropbox.com/s/9xls0tgtf3edgns/mecab-0.996-ko-0.9.2.tar.gz?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.71.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.71.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/9xls0tgtf3edgns/mecab-0.996-ko-0.9.2.tar.gz [following]
--2022-11-16 05:30:26-- https://www.dropbox.com/s/dl/9xls0tgtf3edgns/mecab-0.996-ko-0.9.2.tar.gz
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com/cd/0/get/Bw29JOIWcbnO6vIrxoQTy5re9QtCtzgv1UhjxQzauedYRrScZ7R_gQ_GkkyAyIwDA2tO8jB5uNB3PwFD2cV53UliOK9o2N6ndkU8rX4K6aWmTSFuTjbcKXMN2op9ODK2lwwbTOddv6IHFVvAx2l_9p1U3qJFPRltodNP_erATEZioqJe0c1KSBk_l7zJoN0BaOs/file?dl=1# [following]
--2022-11-16 05:30:27-- https://uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com/cd/0/get/Bw29JOIWcbnO6vIrxoQTy5re9QtCtzgv1UhjxQzauedYRrScZ7R_gQ_GkkyAyIwDA2tO8jB5uNB3PwFD2cV53UliOK9o2N6ndkU8rX4K6aWmTSFuTjbcKXMN2op9ODK2lwwbTOddv6IHFVvAx2l_9p1U3qJFPRltodNP_erATEZioqJe0c1KSBk_l7zJoN0BaOs/file?dl=1
Resolving uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com (uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com)... 162.125.65.15, 2620:100:6021:15::a27d:410f
Connecting to uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com (uc63ee86bb2fecd692b2666d2769.dl.dropboxusercontent.com)|162.125.65.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1414979 (1.3M) [application/binary]
Saving to: ‘mecab-0.996-ko-0.9.2.tar.gz?dl=1.2’
mecab-0.996-ko-0.9. 100%[===================>] 1.35M --.-KB/s in 0.1s
2022-11-16 05:30:27 (11.6 MB/s) - ‘mecab-0.996-ko-0.9.2.tar.gz?dl=1.2’ saved [1414979/1414979]
mecab-0.996-ko-0.9.2/
mecab-0.996-ko-0.9.2/example/
mecab-0.996-ko-0.9.2/example/example.cpp
mecab-0.996-ko-0.9.2/example/example_lattice.cpp
mecab-0.996-ko-0.9.2/example/example_lattice.c
mecab-0.996-ko-0.9.2/example/example.c
mecab-0.996-ko-0.9.2/example/thread_test.cpp
mecab-0.996-ko-0.9.2/mecab-config.in
mecab-0.996-ko-0.9.2/man/
mecab-0.996-ko-0.9.2/man/Makefile.am
mecab-0.996-ko-0.9.2/man/mecab.1
mecab-0.996-ko-0.9.2/man/Makefile.in
mecab-0.996-ko-0.9.2/mecab.iss.in
mecab-0.996-ko-0.9.2/config.guess
mecab-0.996-ko-0.9.2/README
mecab-0.996-ko-0.9.2/COPYING
mecab-0.996-ko-0.9.2/CHANGES.md
mecab-0.996-ko-0.9.2/README.md
mecab-0.996-ko-0.9.2/INSTALL
mecab-0.996-ko-0.9.2/config.sub
mecab-0.996-ko-0.9.2/configure.in
mecab-0.996-ko-0.9.2/swig/
mecab-0.996-ko-0.9.2/swig/Makefile
mecab-0.996-ko-0.9.2/swig/version.h.in
mecab-0.996-ko-0.9.2/swig/version.h
mecab-0.996-ko-0.9.2/swig/MeCab.i
mecab-0.996-ko-0.9.2/aclocal.m4
mecab-0.996-ko-0.9.2/LGPL
mecab-0.996-ko-0.9.2/Makefile.am
mecab-0.996-ko-0.9.2/configure
mecab-0.996-ko-0.9.2/tests/
mecab-0.996-ko-0.9.2/tests/autolink/
mecab-0.996-ko-0.9.2/tests/autolink/unk.def
mecab-0.996-ko-0.9.2/tests/autolink/dicrc
mecab-0.996-ko-0.9.2/tests/autolink/dic.csv
mecab-0.996-ko-0.9.2/tests/autolink/test
mecab-0.996-ko-0.9.2/tests/autolink/char.def
mecab-0.996-ko-0.9.2/tests/autolink/matrix.def
mecab-0.996-ko-0.9.2/tests/autolink/test.gld
mecab-0.996-ko-0.9.2/tests/t9/
mecab-0.996-ko-0.9.2/tests/t9/unk.def
mecab-0.996-ko-0.9.2/tests/t9/ipadic.pl
mecab-0.996-ko-0.9.2/tests/t9/dicrc
mecab-0.996-ko-0.9.2/tests/t9/dic.csv
mecab-0.996-ko-0.9.2/tests/t9/test
mecab-0.996-ko-0.9.2/tests/t9/char.def
mecab-0.996-ko-0.9.2/tests/t9/matrix.def
mecab-0.996-ko-0.9.2/tests/t9/mkdic.pl
mecab-0.996-ko-0.9.2/tests/t9/test.gld
mecab-0.996-ko-0.9.2/tests/cost-train/
mecab-0.996-ko-0.9.2/tests/cost-train/ipa.train
mecab-0.996-ko-0.9.2/tests/cost-train/ipa.test
mecab-0.996-ko-0.9.2/tests/cost-train/seed/
mecab-0.996-ko-0.9.2/tests/cost-train/seed/rewrite.def
mecab-0.996-ko-0.9.2/tests/cost-train/seed/feature.def
mecab-0.996-ko-0.9.2/tests/cost-train/seed/unk.def
mecab-0.996-ko-0.9.2/tests/cost-train/seed/dicrc
mecab-0.996-ko-0.9.2/tests/cost-train/seed/dic.csv
mecab-0.996-ko-0.9.2/tests/cost-train/seed/char.def
mecab-0.996-ko-0.9.2/tests/cost-train/seed/matrix.def
mecab-0.996-ko-0.9.2/tests/run-eval.sh
mecab-0.996-ko-0.9.2/tests/run-cost-train.sh
mecab-0.996-ko-0.9.2/tests/Makefile.am
mecab-0.996-ko-0.9.2/tests/katakana/
mecab-0.996-ko-0.9.2/tests/katakana/unk.def
mecab-0.996-ko-0.9.2/tests/katakana/dicrc
mecab-0.996-ko-0.9.2/tests/katakana/dic.csv
mecab-0.996-ko-0.9.2/tests/katakana/test
mecab-0.996-ko-0.9.2/tests/katakana/char.def
mecab-0.996-ko-0.9.2/tests/katakana/matrix.def
mecab-0.996-ko-0.9.2/tests/katakana/test.gld
mecab-0.996-ko-0.9.2/tests/eval/
mecab-0.996-ko-0.9.2/tests/eval/answer
mecab-0.996-ko-0.9.2/tests/eval/system
mecab-0.996-ko-0.9.2/tests/eval/test.gld
mecab-0.996-ko-0.9.2/tests/shiin/
mecab-0.996-ko-0.9.2/tests/shiin/unk.def
mecab-0.996-ko-0.9.2/tests/shiin/dicrc
mecab-0.996-ko-0.9.2/tests/shiin/dic.csv
mecab-0.996-ko-0.9.2/tests/shiin/test
mecab-0.996-ko-0.9.2/tests/shiin/char.def
mecab-0.996-ko-0.9.2/tests/shiin/matrix.def
mecab-0.996-ko-0.9.2/tests/shiin/mkdic.pl
mecab-0.996-ko-0.9.2/tests/shiin/test.gld
mecab-0.996-ko-0.9.2/tests/latin/
mecab-0.996-ko-0.9.2/tests/latin/unk.def
mecab-0.996-ko-0.9.2/tests/latin/dicrc
mecab-0.996-ko-0.9.2/tests/latin/dic.csv
mecab-0.996-ko-0.9.2/tests/latin/test
mecab-0.996-ko-0.9.2/tests/latin/char.def
mecab-0.996-ko-0.9.2/tests/latin/matrix.def
mecab-0.996-ko-0.9.2/tests/latin/test.gld
mecab-0.996-ko-0.9.2/tests/chartype/
mecab-0.996-ko-0.9.2/tests/chartype/unk.def
mecab-0.996-ko-0.9.2/tests/chartype/dicrc
mecab-0.996-ko-0.9.2/tests/chartype/dic.csv
mecab-0.996-ko-0.9.2/tests/chartype/test
mecab-0.996-ko-0.9.2/tests/chartype/char.def
mecab-0.996-ko-0.9.2/tests/chartype/matrix.def
mecab-0.996-ko-0.9.2/tests/chartype/test.gld
mecab-0.996-ko-0.9.2/tests/run-dics.sh
mecab-0.996-ko-0.9.2/tests/ngram/
mecab-0.996-ko-0.9.2/tests/ngram/unk.def
mecab-0.996-ko-0.9.2/tests/ngram/dicrc
mecab-0.996-ko-0.9.2/tests/ngram/dic.csv
mecab-0.996-ko-0.9.2/tests/ngram/test
mecab-0.996-ko-0.9.2/tests/ngram/char.def
mecab-0.996-ko-0.9.2/tests/ngram/matrix.def
mecab-0.996-ko-0.9.2/tests/ngram/test.gld
mecab-0.996-ko-0.9.2/tests/Makefile.in
mecab-0.996-ko-0.9.2/ltmain.sh
mecab-0.996-ko-0.9.2/config.rpath
mecab-0.996-ko-0.9.2/config.h.in
mecab-0.996-ko-0.9.2/mecabrc.in
mecab-0.996-ko-0.9.2/GPL
mecab-0.996-ko-0.9.2/Makefile.train
mecab-0.996-ko-0.9.2/ChangeLog
mecab-0.996-ko-0.9.2/install-sh
mecab-0.996-ko-0.9.2/AUTHORS
mecab-0.996-ko-0.9.2/doc/
mecab-0.996-ko-0.9.2/doc/bindings.html
mecab-0.996-ko-0.9.2/doc/posid.html
mecab-0.996-ko-0.9.2/doc/unk.html
mecab-0.996-ko-0.9.2/doc/learn.html
mecab-0.996-ko-0.9.2/doc/format.html
mecab-0.996-ko-0.9.2/doc/libmecab.html
mecab-0.996-ko-0.9.2/doc/mecab.css
mecab-0.996-ko-0.9.2/doc/feature.html
mecab-0.996-ko-0.9.2/doc/Makefile.am
mecab-0.996-ko-0.9.2/doc/soft.html
mecab-0.996-ko-0.9.2/doc/en/
mecab-0.996-ko-0.9.2/doc/en/bindings.html
mecab-0.996-ko-0.9.2/doc/dic-detail.html
mecab-0.996-ko-0.9.2/doc/flow.png
mecab-0.996-ko-0.9.2/doc/mecab.html
mecab-0.996-ko-0.9.2/doc/index.html
mecab-0.996-ko-0.9.2/doc/result.png
mecab-0.996-ko-0.9.2/doc/doxygen/
mecab-0.996-ko-0.9.2/doc/doxygen/tab_a.png
mecab-0.996-ko-0.9.2/doc/doxygen/globals_eval.html
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Tagger-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/functions_vars.html
mecab-0.996-ko-0.9.2/doc/doxygen/doxygen.css
mecab-0.996-ko-0.9.2/doc/doxygen/tab_r.gif
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Lattice.html
mecab-0.996-ko-0.9.2/doc/doxygen/functions.html
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Tagger.html
mecab-0.996-ko-0.9.2/doc/doxygen/mecab_8h_source.html
mecab-0.996-ko-0.9.2/doc/doxygen/tabs.css
mecab-0.996-ko-0.9.2/doc/doxygen/nav_f.png
mecab-0.996-ko-0.9.2/doc/doxygen/tab_b.png
mecab-0.996-ko-0.9.2/doc/doxygen/globals.html
mecab-0.996-ko-0.9.2/doc/doxygen/nav_h.png
mecab-0.996-ko-0.9.2/doc/doxygen/tab_h.png
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Model.html
mecab-0.996-ko-0.9.2/doc/doxygen/globals_func.html
mecab-0.996-ko-0.9.2/doc/doxygen/closed.png
mecab-0.996-ko-0.9.2/doc/doxygen/tab_l.gif
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__path__t-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/functions_func.html
mecab-0.996-ko-0.9.2/doc/doxygen/globals_type.html
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Lattice-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__node__t.html
mecab-0.996-ko-0.9.2/doc/doxygen/namespacemembers_func.html
mecab-0.996-ko-0.9.2/doc/doxygen/tab_s.png
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__dictionary__info__t-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/namespacemembers_type.html
mecab-0.996-ko-0.9.2/doc/doxygen/classMeCab_1_1Model-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__dictionary__info__t.html
mecab-0.996-ko-0.9.2/doc/doxygen/namespaces.html
mecab-0.996-ko-0.9.2/doc/doxygen/namespacemembers.html
mecab-0.996-ko-0.9.2/doc/doxygen/namespaceMeCab.html
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__path__t.html
mecab-0.996-ko-0.9.2/doc/doxygen/files.html
mecab-0.996-ko-0.9.2/doc/doxygen/structmecab__node__t-members.html
mecab-0.996-ko-0.9.2/doc/doxygen/index.html
mecab-0.996-ko-0.9.2/doc/doxygen/annotated.html
mecab-0.996-ko-0.9.2/doc/doxygen/globals_defs.html
mecab-0.996-ko-0.9.2/doc/doxygen/classes.html
mecab-0.996-ko-0.9.2/doc/doxygen/mecab_8h-source.html
mecab-0.996-ko-0.9.2/doc/doxygen/doxygen.png
mecab-0.996-ko-0.9.2/doc/doxygen/tab_b.gif
mecab-0.996-ko-0.9.2/doc/doxygen/bc_s.png
mecab-0.996-ko-0.9.2/doc/doxygen/open.png
mecab-0.996-ko-0.9.2/doc/doxygen/mecab_8h.html
mecab-0.996-ko-0.9.2/doc/dic.html
mecab-0.996-ko-0.9.2/doc/partial.html
mecab-0.996-ko-0.9.2/doc/feature.png
mecab-0.996-ko-0.9.2/doc/Makefile.in
mecab-0.996-ko-0.9.2/missing
mecab-0.996-ko-0.9.2/BSD
mecab-0.996-ko-0.9.2/NEWS
mecab-0.996-ko-0.9.2/mkinstalldirs
mecab-0.996-ko-0.9.2/src/
mecab-0.996-ko-0.9.2/src/dictionary.h
mecab-0.996-ko-0.9.2/src/writer.h
mecab-0.996-ko-0.9.2/src/utils.h
mecab-0.996-ko-0.9.2/src/string_buffer.cpp
mecab-0.996-ko-0.9.2/src/tokenizer.cpp
mecab-0.996-ko-0.9.2/src/make.bat
mecab-0.996-ko-0.9.2/src/mecab.h
mecab-0.996-ko-0.9.2/src/freelist.h
mecab-0.996-ko-0.9.2/src/string_buffer.h
mecab-0.996-ko-0.9.2/src/learner_tagger.h
mecab-0.996-ko-0.9.2/src/dictionary_compiler.cpp
mecab-0.996-ko-0.9.2/src/eval.cpp
mecab-0.996-ko-0.9.2/src/mecab-system-eval.cpp
mecab-0.996-ko-0.9.2/src/darts.h
mecab-0.996-ko-0.9.2/src/param.h
mecab-0.996-ko-0.9.2/src/char_property.h
mecab-0.996-ko-0.9.2/src/learner_node.h
mecab-0.996-ko-0.9.2/src/mecab-dict-gen.cpp
mecab-0.996-ko-0.9.2/src/mecab-dict-index.cpp
mecab-0.996-ko-0.9.2/src/winmain.h
mecab-0.996-ko-0.9.2/src/thread.h
mecab-0.996-ko-0.9.2/src/context_id.cpp
mecab-0.996-ko-0.9.2/src/Makefile.am
mecab-0.996-ko-0.9.2/src/connector.h
mecab-0.996-ko-0.9.2/src/common.h
mecab-0.996-ko-0.9.2/src/dictionary_rewriter.cpp
mecab-0.996-ko-0.9.2/src/Makefile.msvc.in
mecab-0.996-ko-0.9.2/src/dictionary_rewriter.h
mecab-0.996-ko-0.9.2/src/feature_index.h
mecab-0.996-ko-0.9.2/src/iconv_utils.cpp
mecab-0.996-ko-0.9.2/src/char_property.cpp
mecab-0.996-ko-0.9.2/src/mecab-test-gen.cpp
mecab-0.996-ko-0.9.2/src/tagger.cpp
mecab-0.996-ko-0.9.2/src/mecab-cost-train.cpp
mecab-0.996-ko-0.9.2/src/learner.cpp
mecab-0.996-ko-0.9.2/src/dictionary.cpp
mecab-0.996-ko-0.9.2/src/lbfgs.cpp
mecab-0.996-ko-0.9.2/src/ucs.h
mecab-0.996-ko-0.9.2/src/writer.cpp
mecab-0.996-ko-0.9.2/src/learner_tagger.cpp
mecab-0.996-ko-0.9.2/src/lbfgs.h
mecab-0.996-ko-0.9.2/src/libmecab.cpp
mecab-0.996-ko-0.9.2/src/tokenizer.h
mecab-0.996-ko-0.9.2/src/mecab.cpp
mecab-0.996-ko-0.9.2/src/utils.cpp
mecab-0.996-ko-0.9.2/src/dictionary_generator.cpp
mecab-0.996-ko-0.9.2/src/param.cpp
mecab-0.996-ko-0.9.2/src/context_id.h
mecab-0.996-ko-0.9.2/src/mmap.h
mecab-0.996-ko-0.9.2/src/viterbi.h
mecab-0.996-ko-0.9.2/src/viterbi.cpp
mecab-0.996-ko-0.9.2/src/stream_wrapper.h
mecab-0.996-ko-0.9.2/src/feature_index.cpp
mecab-0.996-ko-0.9.2/src/nbest_generator.h
mecab-0.996-ko-0.9.2/src/ucstable.h
mecab-0.996-ko-0.9.2/src/nbest_generator.cpp
mecab-0.996-ko-0.9.2/src/iconv_utils.h
mecab-0.996-ko-0.9.2/src/connector.cpp
mecab-0.996-ko-0.9.2/src/Makefile.in
mecab-0.996-ko-0.9.2/src/scoped_ptr.h
mecab-0.996-ko-0.9.2/Makefile.in
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... none
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... none
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking whether gcc needs -traditional... no
checking whether make sets $(MAKE)... (cached) yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... dlltool
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
./configure: line 7378: /usr/bin/file: No such file or directory
checking for mt... no
checking if : is a manifest tool... no
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for library containing strerror... none required
checking whether byte ordering is bigendian... no
checking for ld used by GCC... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for shared library run path origin... done
checking for iconv... yes
checking for working iconv... yes
checking for iconv declaration...
extern size_t iconv (iconv_t cd, char * *inbuf, size_t *inbytesleft, char * *outbuf, size_t *outbytesleft);
checking for ANSI C header files... (cached) yes
checking for an ANSI C-conforming const... yes
checking whether byte ordering is bigendian... (cached) no
checking for string.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking for stdint.h... (cached) yes
checking for sys/stat.h... (cached) yes
checking sys/mman.h usability... yes
checking sys/mman.h presence... yes
checking for sys/mman.h... yes
checking sys/times.h usability... yes
checking sys/times.h presence... yes
checking for sys/times.h... yes
checking for sys/types.h... (cached) yes
checking dirent.h usability... yes
checking dirent.h presence... yes
checking for dirent.h... yes
checking ctype.h usability... yes
checking ctype.h presence... yes
checking for ctype.h... yes
checking for sys/types.h... (cached) yes
checking io.h usability... no
checking io.h presence... no
checking for io.h... no
checking windows.h usability... no
checking windows.h presence... no
checking for windows.h... no
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for off_t... yes
checking for size_t... yes
checking size of char... 1
checking size of short... 2
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of size_t... 8
checking for size_t... (cached) yes
checking for unsigned long long int... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for main in -lstdc++... yes
checking for pthread_create in -lpthread... yes
checking for pthread_join in -lpthread... yes
checking for getenv... yes
checking for opendir... yes
checking whether make is GNU Make... yes
checking if g++ supports stl <vector> (required)... yes
checking if g++ supports stl <list> (required)... yes
checking if g++ supports stl <map> (required)... yes
checking if g++ supports stl <set> (required)... yes
checking if g++ supports stl <queue> (required)... yes
checking if g++ supports stl <functional> (required)... yes
checking if g++ supports stl <algorithm> (required)... yes
checking if g++ supports stl <string> (required)... yes
checking if g++ supports stl <iostream> (required)... yes
checking if g++ supports stl <sstream> (required)... yes
checking if g++ supports stl <fstream> (required)... yes
checking if g++ supports template <class T> (required)... yes
checking if g++ supports const_cast<> (required)... yes
checking if g++ supports static_cast<> (required)... yes
checking if g++ supports reinterpret_cast<> (required)... yes
checking if g++ supports namespaces (required) ... yes
checking if g++ supports __thread (optional)... yes
checking if g++ supports template <class T> (required)... yes
checking if g++ supports GCC native atomic operations (optional)... yes
checking if g++ supports OSX native atomic operations (optional)... no
checking if g++ environment provides all required features... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating src/Makefile.msvc
config.status: creating man/Makefile
config.status: creating doc/Makefile
config.status: creating tests/Makefile
config.status: creating swig/version.h
config.status: creating mecab.iss
config.status: creating mecab-config
config.status: creating mecabrc
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing default commands
make all-recursive
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2'
Making all in src
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/src'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/src'
Making all in man
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/man'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/man'
Making all in doc
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/doc'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/doc'
Making all in tests
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2'
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2'
Making check in src
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/src'
make[1]: Nothing to be done for 'check'.
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/src'
Making check in man
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/man'
make[1]: Nothing to be done for 'check'.
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/man'
Making check in doc
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/doc'
make[1]: Nothing to be done for 'check'.
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/doc'
Making check in tests
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make check-TESTS
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/tests'
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 177
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 178x178
emitting matrix : 100% |###########################################|
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 83
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 84x84
emitting matrix : 100% |###########################################|
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 450
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 1x1
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 162
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 3x3
emitting matrix : 100% |###########################################|
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 4
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 1x1
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 11
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 1
reading ./matrix.def ... 1x1
done!
./pos-id.def is not found. minimum setting is used
reading ./unk.def ... 2
emitting double-array: 100% |###########################################|
./model.def is not found. skipped.
./pos-id.def is not found. minimum setting is used
reading ./dic.csv ... 1
reading ./matrix.def ... 1x1
done!
PASS: run-dics.sh
PASS: run-eval.sh
seed/pos-id.def is not found. minimum setting is used
reading seed/unk.def ... 40
emitting double-array: 100% |###########################################|
seed/model.def is not found. skipped.
seed/pos-id.def is not found. minimum setting is used
reading seed/dic.csv ... 4335
emitting double-array: 100% |###########################################|
reading seed/matrix.def ... 1x1
done!
reading corpus ...
Number of sentences: 34
Number of features: 64108
eta: 0.00005
freq: 1
eval-size: 6
unk-eval-size: 4
threads: 1
charset: EUC-JP
C(sigma^2): 1.00000
iter=0 err=1.00000 F=0.35771 target=2406.28355 diff=1.00000
iter=1 err=0.97059 F=0.65652 target=1484.25231 diff=0.38318
iter=2 err=0.91176 F=0.79331 target=863.32765 diff=0.41834
iter=3 err=0.85294 F=0.89213 target=596.72480 diff=0.30881
iter=4 err=0.61765 F=0.95467 target=336.30744 diff=0.43641
iter=5 err=0.50000 F=0.96702 target=246.53039 diff=0.26695
iter=6 err=0.35294 F=0.95472 target=188.93963 diff=0.23361
iter=7 err=0.20588 F=0.99106 target=168.62665 diff=0.10751
iter=8 err=0.05882 F=0.99777 target=158.64865 diff=0.05917
iter=9 err=0.08824 F=0.99665 target=154.14530 diff=0.02839
iter=10 err=0.08824 F=0.99665 target=151.94257 diff=0.01429
iter=11 err=0.02941 F=0.99888 target=147.20825 diff=0.03116
iter=12 err=0.00000 F=1.00000 target=147.34956 diff=0.00096
iter=13 err=0.02941 F=0.99888 target=146.32592 diff=0.00695
iter=14 err=0.00000 F=1.00000 target=145.77299 diff=0.00378
iter=15 err=0.02941 F=0.99888 target=145.24641 diff=0.00361
iter=16 err=0.00000 F=1.00000 target=144.96490 diff=0.00194
iter=17 err=0.02941 F=0.99888 target=144.90246 diff=0.00043
iter=18 err=0.00000 F=1.00000 target=144.75959 diff=0.00099
iter=19 err=0.00000 F=1.00000 target=144.71727 diff=0.00029
iter=20 err=0.00000 F=1.00000 target=144.66337 diff=0.00037
iter=21 err=0.00000 F=1.00000 target=144.61349 diff=0.00034
iter=22 err=0.00000 F=1.00000 target=144.62987 diff=0.00011
iter=23 err=0.00000 F=1.00000 target=144.60060 diff=0.00020
iter=24 err=0.00000 F=1.00000 target=144.59125 diff=0.00006
iter=25 err=0.00000 F=1.00000 target=144.58619 diff=0.00004
iter=26 err=0.00000 F=1.00000 target=144.58219 diff=0.00003
iter=27 err=0.00000 F=1.00000 target=144.58059 diff=0.00001
Done! writing model file ...
model-ipadic.c1.0.f1.model is not a binary model. reopen it as text mode...
reading seed/unk.def ... 40
reading seed/dic.csv ... 4335
emitting model-ipadic.c1.0.f1.dic/left-id.def/ model-ipadic.c1.0.f1.dic/right-id.def
emitting model-ipadic.c1.0.f1.dic/unk.def ... 40
emitting model-ipadic.c1.0.f1.dic/dic.csv ... 4335
emitting matrix : 100% |###########################################|
copying seed/char.def to model-ipadic.c1.0.f1.dic/char.def
copying seed/rewrite.def to model-ipadic.c1.0.f1.dic/rewrite.def
copying seed/dicrc to model-ipadic.c1.0.f1.dic/dicrc
copying seed/feature.def to model-ipadic.c1.0.f1.dic/feature.def
copying model-ipadic.c1.0.f1.model to model-ipadic.c1.0.f1.dic/model.def
done!
model-ipadic.c1.0.f1.dic/pos-id.def is not found. minimum setting is used
reading model-ipadic.c1.0.f1.dic/unk.def ... 40
emitting double-array: 100% |###########################################|
model-ipadic.c1.0.f1.dic/pos-id.def is not found. minimum setting is used
reading model-ipadic.c1.0.f1.dic/dic.csv ... 4335
emitting double-array: 100% |###########################################|
reading model-ipadic.c1.0.f1.dic/matrix.def ... 346x346
emitting matrix : 100% |###########################################|
done!
precision recall F
LEVEL 0: 12.8959(57/442) 11.8998(57/479) 12.3779
LEVEL 1: 12.2172(54/442) 11.2735(54/479) 11.7264
LEVEL 2: 11.7647(52/442) 10.8559(52/479) 11.2921
LEVEL 4: 11.7647(52/442) 10.8559(52/479) 11.2921
PASS: run-cost-train.sh
==================
All 3 tests passed
==================
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2'
Making install in src
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/src'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/src'
test -z "/usr/local/lib" || /bin/mkdir -p "/usr/local/lib"
/bin/bash ../libtool --mode=install /usr/bin/install -c libmecab.la '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/libmecab.so.2.0.0 /usr/local/lib/libmecab.so.2.0.0
libtool: install: (cd /usr/local/lib && { ln -s -f libmecab.so.2.0.0 libmecab.so.2 || { rm -f libmecab.so.2 && ln -s libmecab.so.2.0.0 libmecab.so.2; }; })
libtool: install: (cd /usr/local/lib && { ln -s -f libmecab.so.2.0.0 libmecab.so || { rm -f libmecab.so && ln -s libmecab.so.2.0.0 libmecab.so; }; })
libtool: install: /usr/bin/install -c .libs/libmecab.lai /usr/local/lib/libmecab.la
libtool: install: /usr/bin/install -c .libs/libmecab.a /usr/local/lib/libmecab.a
libtool: install: chmod 644 /usr/local/lib/libmecab.a
libtool: install: ranlib /usr/local/lib/libmecab.a
libtool: finish: PATH="/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/sbin" ldconfig -n /usr/local/lib
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH' environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf'
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin"
/bin/bash ../libtool --mode=install /usr/bin/install -c mecab '/usr/local/bin'
libtool: install: /usr/bin/install -c .libs/mecab /usr/local/bin/mecab
test -z "/usr/local/libexec/mecab" || /bin/mkdir -p "/usr/local/libexec/mecab"
/bin/bash ../libtool --mode=install /usr/bin/install -c mecab-dict-index mecab-dict-gen mecab-cost-train mecab-system-eval mecab-test-gen '/usr/local/libexec/mecab'
libtool: install: /usr/bin/install -c .libs/mecab-dict-index /usr/local/libexec/mecab/mecab-dict-index
libtool: install: /usr/bin/install -c .libs/mecab-dict-gen /usr/local/libexec/mecab/mecab-dict-gen
libtool: install: /usr/bin/install -c .libs/mecab-cost-train /usr/local/libexec/mecab/mecab-cost-train
libtool: install: /usr/bin/install -c .libs/mecab-system-eval /usr/local/libexec/mecab/mecab-system-eval
libtool: install: /usr/bin/install -c .libs/mecab-test-gen /usr/local/libexec/mecab/mecab-test-gen
test -z "/usr/local/include" || /bin/mkdir -p "/usr/local/include"
/usr/bin/install -c -m 644 mecab.h '/usr/local/include'
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/src'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/src'
Making install in man
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/man'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/man'
make[2]: Nothing to be done for 'install-exec-am'.
test -z "/usr/local/share/man/man1" || /bin/mkdir -p "/usr/local/share/man/man1"
/usr/bin/install -c -m 644 mecab.1 '/usr/local/share/man/man1'
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/man'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/man'
Making install in doc
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/doc'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/doc'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/doc'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/doc'
Making install in tests
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2/tests'
make[1]: Entering directory '/tmp/mecab-0.996-ko-0.9.2'
make[2]: Entering directory '/tmp/mecab-0.996-ko-0.9.2'
test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin"
/usr/bin/install -c mecab-config '/usr/local/bin'
test -z "/usr/local/etc" || /bin/mkdir -p "/usr/local/etc"
/usr/bin/install -c -m 644 mecabrc '/usr/local/etc'
make[2]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2'
make[1]: Leaving directory '/tmp/mecab-0.996-ko-0.9.2'
--2022-11-16 05:31:06-- https://www.dropbox.com/s/i8girnk5p80076c/mecab-ko-dic-2.1.1-20180720.tar.gz?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.7.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.7.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/i8girnk5p80076c/mecab-ko-dic-2.1.1-20180720.tar.gz [following]
--2022-11-16 05:31:06-- https://www.dropbox.com/s/dl/i8girnk5p80076c/mecab-ko-dic-2.1.1-20180720.tar.gz
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com/cd/0/get/Bw1kH15CQEnLgG3rS5oZto4SRxUZU_tB_ogeVj7XCfDVuhHIeBUtisWuOvXrN4CNRs3UaXBz26qSR6QmsryRMXskR49C12CS9Kw-xrElUXAVq1RuPXRlHm35fTd3VA4GpQt6XOZeui0bOli6wjD3B76tRG6-OwvXyZ8WgZYNWElCP7OXMw8mRoFBleyJRfIr8dM/file?dl=1# [following]
--2022-11-16 05:31:07-- https://uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com/cd/0/get/Bw1kH15CQEnLgG3rS5oZto4SRxUZU_tB_ogeVj7XCfDVuhHIeBUtisWuOvXrN4CNRs3UaXBz26qSR6QmsryRMXskR49C12CS9Kw-xrElUXAVq1RuPXRlHm35fTd3VA4GpQt6XOZeui0bOli6wjD3B76tRG6-OwvXyZ8WgZYNWElCP7OXMw8mRoFBleyJRfIr8dM/file?dl=1
Resolving uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com (uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com)... 162.125.65.15, 2620:100:6021:15::a27d:410f
Connecting to uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com (uc803b9dfe98ccd0d9dd3f02a575.dl.dropboxusercontent.com)|162.125.65.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49775061 (47M) [application/binary]
Saving to: ‘mecab-ko-dic-2.1.1-20180720.tar.gz?dl=1.2’
mecab-ko-dic-2.1.1- 100%[===================>] 47.47M 21.2MB/s in 2.2s
2022-11-16 05:31:10 (21.2 MB/s) - ‘mecab-ko-dic-2.1.1-20180720.tar.gz?dl=1.2’ saved [49775061/49775061]
Reading package lists... Done
Building dependency tree
Reading state information... Done
autoconf is already the newest version (2.69-11).
The following package was automatically installed and is no longer required:
libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
mecab-ko-dic-2.1.1-20180720/
mecab-ko-dic-2.1.1-20180720/configure
mecab-ko-dic-2.1.1-20180720/COPYING
mecab-ko-dic-2.1.1-20180720/autogen.sh
mecab-ko-dic-2.1.1-20180720/Place-station.csv
mecab-ko-dic-2.1.1-20180720/NNG.csv
mecab-ko-dic-2.1.1-20180720/README
mecab-ko-dic-2.1.1-20180720/EF.csv
mecab-ko-dic-2.1.1-20180720/MAG.csv
mecab-ko-dic-2.1.1-20180720/Preanalysis.csv
mecab-ko-dic-2.1.1-20180720/NNB.csv
mecab-ko-dic-2.1.1-20180720/Person-actor.csv
mecab-ko-dic-2.1.1-20180720/VV.csv
mecab-ko-dic-2.1.1-20180720/Makefile.in
mecab-ko-dic-2.1.1-20180720/matrix.def
mecab-ko-dic-2.1.1-20180720/EC.csv
mecab-ko-dic-2.1.1-20180720/NNBC.csv
mecab-ko-dic-2.1.1-20180720/clean
mecab-ko-dic-2.1.1-20180720/ChangeLog
mecab-ko-dic-2.1.1-20180720/J.csv
mecab-ko-dic-2.1.1-20180720/.keep
mecab-ko-dic-2.1.1-20180720/feature.def
mecab-ko-dic-2.1.1-20180720/Foreign.csv
mecab-ko-dic-2.1.1-20180720/XPN.csv
mecab-ko-dic-2.1.1-20180720/EP.csv
mecab-ko-dic-2.1.1-20180720/NR.csv
mecab-ko-dic-2.1.1-20180720/left-id.def
mecab-ko-dic-2.1.1-20180720/Place.csv
mecab-ko-dic-2.1.1-20180720/Symbol.csv
mecab-ko-dic-2.1.1-20180720/dicrc
mecab-ko-dic-2.1.1-20180720/NP.csv
mecab-ko-dic-2.1.1-20180720/ETM.csv
mecab-ko-dic-2.1.1-20180720/IC.csv
mecab-ko-dic-2.1.1-20180720/Place-address.csv
mecab-ko-dic-2.1.1-20180720/Group.csv
mecab-ko-dic-2.1.1-20180720/model.def
mecab-ko-dic-2.1.1-20180720/XSN.csv
mecab-ko-dic-2.1.1-20180720/INSTALL
mecab-ko-dic-2.1.1-20180720/rewrite.def
mecab-ko-dic-2.1.1-20180720/Inflect.csv
mecab-ko-dic-2.1.1-20180720/configure.ac
mecab-ko-dic-2.1.1-20180720/NNP.csv
mecab-ko-dic-2.1.1-20180720/CoinedWord.csv
mecab-ko-dic-2.1.1-20180720/XSV.csv
mecab-ko-dic-2.1.1-20180720/pos-id.def
mecab-ko-dic-2.1.1-20180720/Makefile.am
mecab-ko-dic-2.1.1-20180720/unk.def
mecab-ko-dic-2.1.1-20180720/missing
mecab-ko-dic-2.1.1-20180720/VCP.csv
mecab-ko-dic-2.1.1-20180720/install-sh
mecab-ko-dic-2.1.1-20180720/Hanja.csv
mecab-ko-dic-2.1.1-20180720/MAJ.csv
mecab-ko-dic-2.1.1-20180720/XSA.csv
mecab-ko-dic-2.1.1-20180720/Wikipedia.csv
mecab-ko-dic-2.1.1-20180720/tools/
mecab-ko-dic-2.1.1-20180720/tools/add-userdic.sh
mecab-ko-dic-2.1.1-20180720/tools/mecab-bestn.sh
mecab-ko-dic-2.1.1-20180720/tools/convert_for_using_store.sh
mecab-ko-dic-2.1.1-20180720/user-dic/
mecab-ko-dic-2.1.1-20180720/user-dic/nnp.csv
mecab-ko-dic-2.1.1-20180720/user-dic/place.csv
mecab-ko-dic-2.1.1-20180720/user-dic/person.csv
mecab-ko-dic-2.1.1-20180720/user-dic/README.md
mecab-ko-dic-2.1.1-20180720/NorthKorea.csv
mecab-ko-dic-2.1.1-20180720/VX.csv
mecab-ko-dic-2.1.1-20180720/right-id.def
mecab-ko-dic-2.1.1-20180720/VA.csv
mecab-ko-dic-2.1.1-20180720/char.def
mecab-ko-dic-2.1.1-20180720/NEWS
mecab-ko-dic-2.1.1-20180720/MM.csv
mecab-ko-dic-2.1.1-20180720/ETN.csv
mecab-ko-dic-2.1.1-20180720/AUTHORS
mecab-ko-dic-2.1.1-20180720/Person.csv
mecab-ko-dic-2.1.1-20180720/XR.csv
mecab-ko-dic-2.1.1-20180720/VCN.csv
Looking in current directory for macros.
configure.ac:2: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated. For more info, see:
configure.ac:2: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/tmp/mecab-ko-dic-2.1.1-20180720/missing: Unknown `--is-lightweight' option
Try `/tmp/mecab-ko-dic-2.1.1-20180720/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for mecab-config... /usr/local/bin/mecab-config
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
make: Nothing to be done for 'all'.
make[1]: Entering directory '/tmp/mecab-ko-dic-2.1.1-20180720'
make[1]: Nothing to be done for 'install-exec-am'.
/bin/mkdir -p '/usr/local/lib/mecab/dic/mecab-ko-dic'
/usr/bin/install -c -m 644 model.bin matrix.bin char.bin sys.dic unk.dic left-id.def right-id.def rewrite.def pos-id.def dicrc '/usr/local/lib/mecab/dic/mecab-ko-dic'
make[1]: Leaving directory '/tmp/mecab-ko-dic-2.1.1-20180720'
fatal: destination path 'mecab-python-0.996' already exists and is not an empty directory.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: konlpy in /usr/local/lib/python3.7/dist-packages (0.6.0)
Requirement already satisfied: JPype1>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (1.4.1)
Requirement already satisfied: lxml>=4.1.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (4.9.1)
Requirement already satisfied: numpy>=1.6 in /usr/local/lib/python3.7/dist-packages (from konlpy) (1.21.6)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from JPype1>=0.7.0->konlpy) (4.1.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from JPype1>=0.7.0->konlpy) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->JPype1>=0.7.0->konlpy) (3.0.9)
In [22]:
from konlpy.tag import Mecab
mecab = Mecab()
df = pd.read_csv(file_url, sep='\t', index_col=0)
df.head()
Out[22]:
| document | label | |
|---|---|---|
| id | ||
| 8112052 | 어릴때보고 지금다시봐도 재밌어요ㅋㅋ | 1 |
| 8132799 | 디자인을 배우는 학생으로, 외국디자이너와 그들이 일군 전통을 통해 발전해가는 문화산... | 1 |
| 4655635 | 폴리스스토리 시리즈는 1부터 뉴까지 버릴께 하나도 없음.. 최고. | 1 |
| 9251303 | 와.. 연기가 진짜 개쩔구나.. 지루할거라고 생각했는데 몰입해서 봤다.. 그래 이런... | 1 |
| 10067386 | 안개 자욱한 밤하늘에 떠 있는 초승달 같은 영화. | 1 |
In [23]:
df = df.dropna()
df = pd.concat([df.head(1000), df.tail(1000)])
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2000 entries, 8112052 to 8548411 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 document 2000 non-null object 1 label 2000 non-null int64 dtypes: int64(1), object(1) memory usage: 46.9+ KB
In [24]:
def handle_naive_bayes(df: pd.DataFrame, tagger):
nouns = df.document.apply(tagger).apply(" ".join)
cv = CountVectorizer()
x = cv.fit_transform(nouns)
y = df.label
x_train, x_test, y_train, y_test = train_test_split(x, y)
model = MultinomialNB()
model.fit(x_train, y_train)
pred = model.predict(x_test)
print(accuracy_score(y_test, pred))
sns.heatmap(confusion_matrix(y_test, pred),
cmap='coolwarm', annot=True, fmt='.0f')
plt.title("CONFUSION MATRIX")
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
handle_naive_bayes(df, mecab.nouns)
0.62
'CS & DS > scikit-learn Machine Learning' 카테고리의 다른 글
Comments