ディープラーニングの心理学的解釈 (心理学特講IIIA)¶
第 09 回 自動翻訳, 文章要約, 転移学習, マルチモーダル学習, マルチタスク学習
マルチタスク学習,転移学習¶
- 学習したことがらを応用することは賢さの尺度でしょう
たとえば,映画カラテキッド(1984)では,ミヤギ先生はダニエルさんに車のワックスがけや床掃除を教えました :-) ワックスがけや床磨きは空手の技術習得にとって必要な技能であったというオチです。
実習ファイル¶
Hard parameter sharing¶
左:マルチタスク学習, 右:転移学習, いずれも Sebastuan Ruder のブログより
Soft parameter sharing¶
In soft parameter sharing on the other hand, each task has its own model with its own parameters. The distance between the parameters of the model is then regularized in order to encourage the parameters to be similar. [8] for instance use the norm for regularization, while [9] use the trace norm.
- [8]: Duong, L., Cohn, T., Bird, S., & Cook, P. (2015). Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), 845–850.
- [9]: Yang, Y., & Hospedales, T. M. (2017). Trace Norm Regularised Deep Multi-Task Learning. In Workshop track - ICLR 2017. Retrieved from http://arxiv.org/abs/1606.04038
Recent work on MTL for Deep Learning¶
Deep Relationship Networks¶
A Deep Relationship Network with shared convolutional and task-specific fully connected layers with matrix priors (Long and Wang, 2015).
- Long, M., & Wang, J. (2015). Learning Multiple Tasks with Deep Relationship Networks. arXiv Preprint arXiv:1506.02117. Retrieved from http://arxiv.org/abs/1506.02117 ↩︎
Fully-Adaptive Feature Sharing¶
The widening procedure for fully-adaptive feature sharing (Lu et al., 2016).
Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., & Feris, R. (2016). Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification. Retrieved from http://arxiv.org/abs/1611.05377
Cross-stitch Networks¶
Cross-stitch networks for two tasks (Misra et al., 2016).
Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch Networks for Multi-task Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2016.433
A Joint Many-Task Model¶
A Joint Many-Task Model (Hashimoto et al., 2016).
Weighting losses with uncertainty¶
Uncertainty-based loss function weighting for multi-task learning (Kendall et al., 2017).
Kendall, A., Gal, Y., & Cipolla, R. (2017). Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. Retrieved from http://arxiv.org/abs/1705.07115
Sluice Networks¶
A sluice network for two tasks (Ruder et al., 2017).
Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2017). Sluice networks: Learning what to share between loosely related tasks. Retrieved from http://arxiv.org/abs/1705.08142