仿¥ãããžãã¹ãã·ã¹ãã ã®ããããšããã§çºçãã倧éã®ããŒã¿ãåéããŠåæããããšã¯èªç¶ãªããšã§ãããããå®è¡ããããã«ä»¥åã®ããã«å€§ããªæè³ãããå¿ èŠããªããªã£ãŠããŠããŸãããããŠããŠãŒã¶ãŒã®ã¯ãªãã¯ã¹ããªãŒã ãã¡ãŒã«ãã£ã³ããŒã³ãžã®ã¬ã¹ãã³ã¹ãªã©ããåŸãããäœã®ã¬ãã€ããäœãã¿ãã€ãã®ããŒã¿ã®ãªãã«ã¯çŽ æŽãããçºèŠãé ãããŠããããšã¯èšããŸã§ããããŸããããããããããå®çŸããããã«ã¯æ©æ¢°åŠç¿ã«æãããåªç§ãªããŒã¿ãµã€ãšã³ãã£ã¹ããéçšãã圌ãã®ããã«ã¹ã±ãŒã©ãã«ã§ä¿¡é Œæ§ã®é«ãããŒã«ãçšæããããæŽã«ããããµããŒãã§ããã€ã³ãã©ã¹ãã©ã¯ãã£ãçšæããªããã°ãªããŸããã§ããã
æ©æ¢°åŠç¿(Machine Learning, ML)ã¯ããŒã¿ã®åæã«æ°åŠçãªè£ä»ãããããããŸããããã¯ããªãã®ããŒã¿çŸ€ã®ãªãããçžé¢æ§ãèŠã€ãåºããããããé«å質ãªäºæž¬ãžãšå€èº«ãããŠãããŸããæ©æ¢°åŠç¿ãé©åã«å©çšãããšãäžæ£æ€ç¥ãéèŠäºæž¬ãåºåã®ã¿ãŒã²ãã£ã³ã°ãªã©ã«åœ¹ç«ãŠãããšãåºæ¥ãŸãã
Amazon Machine Learningã®ã玹ä»
ãããŠä»æ¥ãç§ãã¡ã¯Amazon Machine Learningãã玹ä»ããŸãããã®æ°ããAWSã®ãµãŒãã¹ã¯ãããžãã¹çãªæ±ºæã®è³ªãäžããããã®ããŒã¿ã®æå¹æŽ»çšããæäŒãããŸããAmazon Machine Learningã«ããã倧éã®ããŒã¿ããæŽç·Žãããäºæž¬ã¢ãã«ãæ§ç¯ããã¹ã±ãŒã©ããªãã£ãæã£ãäºæž¬ã®å®è¡ãããããšãå¯èœã«ãªããŸãããã¡ããçµ±èšã«ã€ããŠã®æ·±ãç¥èãåŠäœããã£ãŠããªããŠãå©çšå¯èœã§ããããããã®åŠçãããããã®ã¹ã±ãŒã©ããªãã£ãæã£ãã€ã³ãã©ã¹ãã©ã¯ãã£ãèªèº«ã§æ§ç¯/éçšããå¿ èŠããããŸããã
ããããç°¡åã«ãã®ãµãŒãã¹ã®è©³çްã玹ä»ããããšæããŸããããã®åã«ãããçè§£ãæ·±ããããã«æ©æ¢°åŠç¿ã«é¢ããŠã®çšèªãã³ã³ã»ãããæŽçããããšæããŸãã
æ©æ¢°åŠç¿ãšã¯
æ©æ¢°åŠç¿ã掻çšããããã«ã¯ããŸãã¯åŠç¿çšã®ããŒã¿ãå¿ èŠã«ãªããŸããäŸãã°ããŒã¿ããŒã¹ãã¹ãã¬ããã·ãŒãã®è¡ãªã©ãæ³åããŠã¿ããšããã§ããããè¡ã¯ãããã1ã€ã®ããŒã¿ã¬ã³ãŒã(äŸãã°1ã€ã®è³Œå ¥è¡çºã§ãã£ãã1ã€ã®çºéã1ã€ã®ã«ã¿ãã°ã¢ã€ãã ãªã©ã§ã)ã衚ããåã¯ãããããäŸãã°éµäŸ¿çªå·ãè³Œå ¥äŸ¡æ Œã®ãããªããŒã¿ã®å±æ§ã衚ããŸãã
ãã®ããŒã¿ã¯å®éã®äºæž¬çµæã®äŸãæã£ãŠããå¿ èŠããããŸããäŸãã°æ£åžžãªãã®ãäžæ£ãªãã®ãå ¥ãæ··ãã£ãåååŒã®ãã©ã³ã¶ã¯ã·ã§ã³ã®ããŒã¿ã»ã»ããããã£ããšãããšããã¹ãŠã®è¡ã¯ããããæ£åžžããªã®ããäžæ£ããªã®ãã衚ãçµæãæã£ãŠããå¿ èŠããããŸãããããç®ç倿°ãšåŒã³ãŸãããã®ããŒã¿ã»ã»ããã¯æ©æ¢°åŠç¿ã¢ãã«ãæ§ç¯ããããã«äœ¿ãããæ°èŠã®ããŒã¿ãå ¥åãããŠãããšããã®ã¢ãã«ã«åºããŠäžæ£å€å®ãªã©ã®äºæž¬ãè¡ããŸããããã§å©çšãããäºæž¬ã«ã¯å€§ããåããŠ3ã€ã®ææ³ããããŸããã²ãšã€ãã€èŠãŠè¡ããŸãããã
Binary classification(äºå€åé¡ã»äºé åé¡)ã¯ãå ¥åãããããŒã¿ã2ã€ã®éžæè¢ã®ã©ã¡ããã«åé¡ããŸããäŸãã°ãããã¯æ£åžžã®ååŒãªã®ãïŒãã§ãã£ããããã®é¡§å®¢ã¯ãã®ååãè²·ããã©ããïŒãããã®äœæã¯äžè»å®¶ãªã®ãéåäœå® ãªã®ãïŒããšãã£ããããªåé¡ã®äºæž¬ã«å©çšãããŸãã
Multiclass classification(å€å€åé¡ã»å€é åé¡)ã¯ãå ¥åãããããŒã¿ã3ã€ä»¥äžã®éžæè¢ã®ãã¡ããããã«åé¡ããŸããäŸãã°ããã®ååã¯æ¬ãªã®ããDVDãªã®ãããããã¯è¡£é¡ãªã®ããã§ãã£ããããã®æ ç»ã¯ã³ã¡ãã£ãªã®ãããã¥ã¡ã³ã¿ãªãŒãªã®ããã©ãŒãªã®ããããã®ãŠãŒã¶ãŒã¯ã©ã®ã«ããŽãªã«ãã£ãšãèå³ãããã®ãïŒããšãã£ããããªåé¡ã«çšããããŸãã
Regression(ååž°åæ)ã¯å€ã®äºæž¬ã«å©çšãããŸããäŸãã°ã27ã€ã³ãã¢ãã¿ã®åšåº«ã¯ã©ã®ãããæã£ãŠããã¹ããïŒãã ã£ããããã®ååã®å€æ®µã¯ãããã«ãã¹ããïŒãããã®ååã®å£²äžã®ãã¡äœå²ããããã®ãããšããŠè²©å£²ããããïŒããšãã£ãåé¡ã®äºæž¬ã«å©çšãããŸãã
é©åã«èšç·Žãããã¢ãã«ã¯äžèšã®åé¡ã®ãã¡ããããã²ãšã€ã«çããæäŸããŠãããŸãããããå Žåã«ãã£ãŠã¯è€æ°ã®ã¢ãã«ã«åãèšç·ŽããŒã¿ãå©çšããããšããããŸãã
æ©æ¢°åŠç¿ãå§ããã«ããã£ãŠããŸãããã¹ãããšã¯ããŒã¿ã®éã質ãäžããŠããããšã§ããããããã¯ãã¡ããåŠç¿ããã»ã¹ã®ç²ŸåºŠããããããã§ããç°¡åãªäŸããããŠã¿ãŸããäŸãã°ããããéµäŸ¿çªå·ãããŒã¹ã«ããå°çæ å ±ããåéãå§ãããšããŸããããäžå®æéãããçšåºŠã®åæãçµããã¡ã«ãéµäŸ¿çªå·ä»¥å€ã®å±æ§ãåãããŠå©çšããããšã«ãããããäºæž¬ã®ç²ŸåºŠãäžããããšãã§ããããšã«æ°ã¥ãã§ããããæ©æ¢°åŠç¿ã®ããã»ã¹ã¯å埩çãªãã®ã§ãããèšç·ŽããŒã¿ããã¢ãã«æ§ç¯ãšäºæž¬ã宿œãããã®çµæãè©äŸ¡ããæŽã«èšç·ŽããŒã¿ãããæŽç·Žããããšããããã»ã¹ãç¹°ãè¿ããŠããããšãå¿ èŠãªã®ã§ãã
ãã®ã¢ãã«ã®è©äŸ¡ããã»ã¹ã«éããŠãç§ãã¡ã¯ããã€ãã®ã¡ããªã¯ã¹ãå©çšããããšãã§ããŸããäŸãã°AUC(Area Under Curveãæ²ç·äžé¢ç©)ã¯äºå€åé¡ã®å質枬å®ã«å©çšãããŸããããã¯0.0ãã1.0ã®éã®æµ®åå°æ°ç¹æ°ã§ãã¢ãã«ã«ãã£ãŠå®æœãããäºæž¬ã®äžç¢ºå®æ§ã衚ããŸãããã®å€ã0.5ãã1.0ã«äžããããšã¯äºæž¬ç²ŸåºŠãäžãã£ãããšã衚ããŸãããããããã®å€ã1.0ãããã¯ããã«éåžžã«è¿ãå€ãåã£ãŠããããšã¯ãèšç·ŽããŒã¿ã«åé¡ãããããšã衚ããŸããããããåé¡ãšããŠoverfitting(éåŠç¿ãéå°é©å)ãšãããã®ããããŸããããã¯èšç·ŽããŒã¿ã«ç®çããŒã¿ãå«ããŠããŸãããšã«ãã£ãŠåŒãèµ·ããããåé¡ã§ãã¢ãã«ãããã»ã©åªåããã«ãã¿ãŒã³ãèŠã€ãåºããŠããŸãç¶æ ããããããŸãããŸããAUCã0.0ã«è¿ãç¶æ ã«ãããŠã¯èª€å€æã容æã«èµ·ãããŸãããã®åé¡ã¯èšç·ŽããŒã¿ã®ã©ãã«ä»ã®ééãã«ãã£ãŠåŒãèµ·ããããããšããããŸãã
ããããã¡ã¯èªåã®äºå€åé¡ã¢ãã«ãæ§ç¯ããŠããéçšã§ãäºæž¬ã®çµæãã¬ãã¥ãŒããcutoff(éŸå€)ã®èª¿æŽãããŠããå¿ èŠããããŸãããã®å€æ°ã¯äºæž¬ã®æ£ããã®ç¢ºçã確ããããã衚ããŸããããããã¡ã¯åãæ±ãåé¡ã«ãããfalse positive(æ¬æ¥falseãšå€å®ãããã¹ããã®ãtrueãšå€å®ãããããš)ãšfalse negative(æ¬æ¥trueãšå€å®ãããã¹ããã®ãfalseãšå€å®ãããããš)ããããã®éèŠåºŠãã¯ãªãã£ã«ã«åºŠã«åãããŠcutoffã調æŽããããšãã§ããŸããã¹ãã ã¡ãŒã«å€å®ã®äŸãèããŠã¿ãŸããããfalse negativeãèµ·ãããšãã¹ãã ã¡ãŒã«ãã¡ãŒã«ããã¯ã¹ã«å ¥ã£ãŠããŠããŸããŸããäžæ¹ãfalse positiveãèµ·ãããšãéèŠãªã¡ãŒã«ãã¹ãã ãã£ã«ã¿ã«åŒã£ããã£ãŠããŸããŸãããã®å ŽåãåŸè ã®ã»ããã¯ãªãã£ã«ã«ãªåé¡ã§ããããšã¯æçœã§ãããããããã£ãããšãèµ·ãããªãããã«cutoffã調æŽããå€å®ã®ç¢ºçãã©ã¡ãåŽã«åãã®ããšããã®ã調æŽããŠããããã§ãã
Amazon Machine Learning in Action
ããã§ã¯Getting Started Guideã«æ²¿ã£ãŠãã¢ãã«ã®äœæããäºæž¬ã®å®è¡ãŸã§ãã£ãŠã¿ãŸããããAmazon Machine Learningãžã®ãµã€ã³ã¢ããã¯ãã¡ããåç §ããŠãã ãããããŠããã®Getting Started Guideã§ã¯UC Irvine Machine Learning Repositoryã®bank marketing datasetã«å°ãæãå ãããã®ãå©çšããŠããŸããããããæ§ç¯ããã¢ãã«ã¯ããã®é¡§å®¢ã¯æ°ãµãŒãã¹ã«ç»é²ãããŠããããã©ããïŒããšããåé¡ã«çããŠãããŸãã
ãŸãäžèšããããŠã³ããŒãããbanking.csvãAmazon Simple Storage Service(S3)ã«ã¢ããããŒãããŠãAmazon Machine Learningããã®ã¢ã¯ã»ã¹ãå¯èœãªããã«IAM policyãèšå®ããŸãã
次ã«Amazon Machine Learning Datasourceãªããžã§ã¯ããäœæããŠãå ã»ã©ã¢ããããŒããããã¡ã€ã«ããªããžã§ã¯ãã«äžããŠãããŸãããã®ãªããžã§ã¯ãã¯ããŒã¿ã®å Žæã倿°åãåãç®ç倿°ã®ååãå倿°ã®èšè¿°çµ±èšãªã©ãä¿æããŸããAmazon Machine Learningã®ã»ãšãã©ã®æäœããªãã¬ãŒã·ã§ã³ã¯ãã®Datasourceãåç §ããŸããèšå®ã¯äžèšã®ã¹ã¯ãªãŒã³ã·ã§ããã®ããã«ãªããŸãã
Amazon Marhine Learningã¯Datasourceã«Amazon RedshiftãMySQLïŒãã¡ããAmazon RDS for MySQLãïŒãå©çšããããšãå¯èœã§ããäžèšã®ã¹ã¯ãªãŒã³ã·ã§ããã®ç»é¢ã§Redshiftãéžæãããšãã¯ã©ã¹ã¿åãããŒã¿ããŒã¹åãã¯ã¬ãã³ã·ã£ã«ãããŒã¿ååŸã®ããã®SQLã¯ãšãªãªã©ã®å ¥åãæ±ããããŸãã
Amazon Machine Learningã¯ãã¡ã€ã«ãèµ°æ»ããããããã®åã®åæšæž¬ãè¡ããäžèšã®ããã«ã¹ããŒãã®ææ¡ãããŠãããŸãã
ãã®ã±ãŒã¹ã§ã¯åæšæž¬ã¯ãã¹ãŠæ£ããã£ãã§ãããããããã§ãªãå Žåã¯Change Typeãã¯ãªãã¯ããããšã§æåã§ä¿®æ£å¯èœã§ãã
ããŠããã®Datasourceãæ©æ¢°åŠç¿ã¢ãã«ã®è©äŸ¡ãšæ§ç¯ã«å©çšããããã«ã¯èšç·Žå€æ°ãæå®ããŠããå¿ èŠããããŸãããã®ããŒã¿ã»ããã®èšç·Žå€æ°(y )ã¯ãã€ããªåãªã®ã§ãããããæ§ç¯ãããäºæž¬ã¢ãã«ã¯äºå€åé¡ãå©çšããããšã«ãªããŸãã
ããã«æ°ã¯ãªãã¯é²ããŠãããšDatasourceã®äœææºåãå®äºããŸãã
Datasourceã®äœæã¯æ°åã§å®äºããŸãã
åè¿°ã®ããã«ãããããã¡ã¯ããŒã¿ãããããç¥ãããšã§äºæž¬ã¢ãã«ã®ç²ŸåºŠããããããšãã§ããŸããAmazon Machine Learningã¯ãããå©ããããã€ãã®ããŒã«ãæäŸããŠãããŸããäŸãã°äžèšã®ããã«Datasourceã®ãã倿°ã®å€ã®ååžãå¯èŠåããŠããããããªããŒã«ããããŸãã
ããŠã次ã¯ã¢ãã«ã®äœæã§ãã
ä»åã¯ããã©ã«ãã®èšå®ãå©çšããããšã«ããŸããAmazon Machine Learningã¯ããã©ã«ãã§ããŒã¿ã»ããã®70%ãèšç·ŽçšããŒã¿ã30ïŒ ãè©äŸ¡çšããŒã¿ãšããŠå©çšããŸãã
ã«ã¹ã¿ã ãªãã·ã§ã³ãéžæããå ŽåãAmazon Machine LearningãDatasourceã®ããŒã¿ã»ã»ãããæŽåœ¢ãããããããã«å©çšããã¬ã·ããã«ã¹ã¿ãã€ãºããããšãã§ããŸãã
ããã«æ°ã¯ãªãã¯ãããããšãAmazon Machine Learningãã¢ãã«ã®äœæãéå§ããŸããã¢ãã«ã®äœæã«ã¯å°ãæéãããããŸãã
ã¢ãã«ã®äœæãçµãããšæ©éäžèšã®ããã«å質ã®ã¡ããªã¯ã¹ã確èªã§ããããã«ãªã£ãŠããŸãã
æ°ãµãŒãã¹ã«ç»é²ããŠãããããªãããã顧客ãéžæããããã«ãAdjust Score Thresholdãã¯ãªãã¯ããŠã5%ã®ã¬ã³ãŒãã ããyã®å€ã1ãšå€æãããããã«cutoffã®å€ã調æŽããŠã¿ãŸãã
ãã®èšå®ã§ã¯false positiveã®å²åã¯ããã1.3%ã«æããããããã§ãããããŠ22%ãfalse negativeãšå€å®ãããæ®ãã®77%ãæ£ããå€å®ãããããšããããšã«ãªããŸããä»åã®ã±ãŒã¹ã§ã¯false positiveã¯æãŸãããªããšããèãããããããåºæ¥ãéãé¿ããèšå®ãšããŸãããå®éã®ããžãã¹ãæ³åããŠã¿ããšãããã«ããééã£ã顧客ã«å¯ŸããŠé«ã³ã¹ãã®ããã¢ãŒã·ã§ã³ã宿œããããšãé¿ããäºãã§ããããã§ãã
ããã§äœæãããã¢ãã«ã䜿ã£ãŠãããäºæž¬ãè¡ã£ãŠã¿ãŸãã(Amazon Machine Learningã¯ãããäºæž¬ãšãªã¢ã«ã¿ã€ã äºæž¬ã®äž¡æ¹ãå¯èœã§ã)ãããäºæž¬ã§ã¯ãå€ãã®ã¬ã³ãŒãã»ããã«å¯ŸããŠãŸãšããŠäºæž¬ã宿œããããšãã§ããŸãã
ããã«ããã§ã¯Getting Started Guidã«æ²¿ã£ãŠãäºæž¬ã®å¯Ÿè±¡ã®ããŒã¿ã»ãããå«ãã DatasourceãäœæããŸãããã®ãã¡ã€ã«ã¯å çšã®ãã®ãšã¯éã£ãŠãyã®å€æ°ãæã£ãŠããŸããã
S3ã«é 眮ãããã¡ã€ã«ãæå®ããŠããããäºæž¬ãéå§ããŠãããŸãã
äºæž¬ã«ã¯å°ãæéãæãããŸããçµäºãããšãæå®ããS3ã®ãã±ãŒã·ã§ã³ã«çµæãåºåãããã®ã§ããããããŠã³ããŒããšããŠè§£åããŠäžèº«ãèŠãŠã¿ãŸãããã
ããããã®è¡ã¯ãªãªãžãã«ã®ãã¡ã€ã«ã®è¡ãšãããããŸãããããŠäžã€ç®ã®åãäºæž¬ããã倿°yã§ã2çªãã®ãã€ããå®éã®ã¹ã³ã¢ã§ãã
ããã§ã¯ãªã¢ã«ã¿ã€ã äºæž¬ã®ã»ããå°ãèŠãŠã¿ãŸãããããã¡ãã®ã±ãŒã¹ã§ã¯ãåã ã®ããŒã¿ã®å ¥åãšåºåã®éã«äºæž¬ãè¡ãå¿ èŠããããŸãã
èšå®ã¯ãã®ããã«é²ããŠããããšãã§ããŸãã
ãªã¢ã«ã¿ã€ã äºæž¬ãæå¹åããã®ã¡ãAmazon Machine Learningã®Predict颿°ãåŒã³åºãã³ãŒããæžããŠããããšã«ãªããŸãããã®é¢æ°ã®åŒæ°ãšããŠç®çããŒã¿ãæž¡ããŠããããšã§ãã¬ã¹ãã³ã¹ãšããŠäºæž¬çµæãåãåãããšãã§ããããã«ãªãããã§ãã
äžèšã®ã³ãŒãã¯äžèšã®ãããªçµæãåºåããŠãããã§ãããã
Things to Know
Amazon Machine Learningã¯æ¬æ¥ããUS East(N.Virginia)ãªãŒãžã§ã³ã§å©çšå¯èœã§ããè²»çšã¯ãã€ãã®ããã«åŸé課éã¢ãã«ã§ãäžèšã®ãããªèšç®ã§æ±ºå®ãããŸãã
- ããŒã¿åæ/ã¢ãã«ã®ãã¬ãŒãã³ã°/ã¢ãã«ã®è©äŸ¡: $0.42/æé
- ãããäºæž¬: 100äžä»¶ã®äºæž¬å®è¡ããã$100ãå®éã®èª²éã¯1000ä»¶ããšã«èª²éããããŠãããŸã
- ãªã¢ã«ã¿ã€ã äºæž¬: 100äžä»¶ã®äºæž¬å®è¡ããã$100ãå®éã®èª²éã¯1000ä»¶ããšã®èª²éãæŽã«ã¢ãã«ãå©çšãããã¡ã¢ãªå®¹éã«åºã¥ããæéåäœã®èª²éã
- S3ãRDSãRedshiftã®æéã¯å¥éçºçããŸãã
âJeff 翻蚳ã¯ä»äºãæ åœããŸããã