A survey on Transfer Learning Sinno Jialin Pan



Yüklə 0,55 Mb.
tarix06.10.2018
ölçüsü0,55 Mb.
#72626


A Survey on Transfer Learning

  • Sinno Jialin Pan

  • Department of Computer Science and Engineering

  • The Hong Kong University of Science and Technology

  • Joint work with Prof. Qiang Yang




Outline



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion







Notation

  • Domain:

  • It consists of two components: A feature space , a marginal distribution

  • In general, if two domains are different, then they may have different feature spaces

  • or different marginal distributions.

  • Task:

  • Given a specific domain and label space , for each in the domain, to

  • predict its corresponding label

  • In general, if two tasks are different, then they may have different label spaces or

  • different conditional distributions



Notation

  • For simplicity, we only consider at most two domains and two tasks.

  • Source domain:

  • Task in the source domain:

  • Target domain:

  • Task in the target domain



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion



Why Transfer Learning?

  • In some domains, labeled data are in short supply.

  • In some domains, the calibration effort is very expensive.

  • In some domains, the learning process is time consuming.



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion



Settings of Transfer Learning





Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion



Approaches to Transfer Learning



Approaches to Transfer Learning



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

    • Inductive Transfer Learning
    • Transductive Transfer Learning
    • Unsupervised Transfer Learning


Inductive Transfer Learning Instance-transfer Approaches

  • Assumption: the source domain and target domain data use exactly the same features and labels.

  • Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting.

  • Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.



Inductive Transfer Learning --- Instance-transfer Approaches Non-standard SVMs [Wu and Dietterich ICML-04]



Inductive Transfer Learning --- Instance-transfer Approaches TrAdaBoost [Dai et al. ICML-07]



Inductive Transfer Learning Feature-representation-transfer Approaches Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07]

  • Assumption: If t tasks are related to each other, then they may

  • share some common features which can benefit for all tasks.

  • Input: t tasks, each of them has its own training data.

  • Output: Common features learnt across t tasks and t models for t

  • tasks, respectively.



Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07]

  • where



Inductive Transfer Learning Feature-representation-transfer Approaches Unsupervised Feature Construction [Raina et al. ICML-07]

  • Three steps:

  • Applying sparse coding [Lee et al. NIPS-07] algorithm to learn higher-level representation from unlabeled data in the source domain.

  • Transforming the target data to new representations by new bases learnt in the first step.

  • Traditional discriminative models can be applied on new representations of the target data with corresponding labels.



Unsupervised Feature Construction [Raina et al. ICML-07]

  • Step1:

  • Input: Source domain data and coefficient

  • Output: New representations of the source domain data

  • and new bases

  • Step2:

  • Input: Target domain data , coefficient and bases

  • Output: New representations of the target domain data



Inductive Transfer Learning Model-transfer Approaches Regularization-based Method [Evgeiou and Pontil, KDD-04]

  • Assumption: If t tasks are related to each other, then they may share some

  • parameters among individual models.

  • Assume be a hyper-plane for task , where and

  • Encode them into SVMs:



Inductive Transfer Learning Relational-knowledge-transfer Approaches TAMAR [Mihalkova et al. AAAI-07]

  • Assumption: If the target domain and source domain are related, then there

  • may be some relationship between domains being similar, which can be used for

  • transfer learning

  • Input:

  • Relational data in the source domain and a statistical relational model, Markov Logic Network (MLN), which has been learnt in the source domain.

  • Relational data in the target domain.

  • Output: A new statistical relational model, MLN, in the target domain.

  • Goal: To learn a MLN in the target domain more efficiently and effectively.



TAMAR [Mihalkova et al. AAAI-07]

  • Two Stages:

  • Predicate Mapping

    • Establish the mapping between predicates in the source and target domain. Once a mapping is established, clauses from the source domain can be translated into the target domain.
  • Revising the Mapped Structure

    • The clauses mapping from the source domain directly may not be completely accurate and may need to be revised, augmented , and re-weighted in order to properly model the target data.


TAMAR [Mihalkova et al. AAAI-07]



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

    • Inductive Transfer Learning
    • Transductive Transfer Learning
    • Unsupervised Transfer Learning


Transductive Transfer Learning Instance-transfer Approaches Sample Selection Bias / Covariance Shift [Zadrozny ICML-04, Schwaighofer JSPI-00]

  • Input: A lot of labeled data in the source domain and no labeled data in the

  • target domain.

  • Output: Models for use in the target domain data.

  • Assumption: The source domain and target domain are the same. In addition,

  • and are the same while and may be

  • different causing by different sampling process (training data and test data).

  • Main Idea: Re-weighting (important sampling) the source domain data.



Sample Selection Bias/Covariance Shift

  • To correct sample selection bias:

  • How to estimate ?

  • One straightforward solution is to estimate and ,

  • respectively. However, estimating density function is a hard problem.



Sample Selection Bias/Covariance Shift Kernel Mean Match (KMM) [Huang et al. NIPS 2006]

  • Main Idea: KMM tries to estimate directly instead of estimating

  • density function.

  • It can be proved that can be estimated by solving the following quadratic

  • programming (QP) optimization problem.

  • Theoretical Support: Maximum Mean Discrepancy (MMD) [Borgwardt et al.

  • BIOINFOMATICS-06]. The distance of distributions can be measured

  • by Euclid distance of their mean vectors in a RKHS.



Transductive Transfer Learning Feature-representation-transfer Approaches Domain Adaptation [Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07]

  • Assumption: Single task across domains, which means and

  • are the same while and may be different causing by feature

  • representations across domains.

  • Main Idea: Find a “good” feature representation that reduce the “distance”

  • between domains.

  • Input: A lot of labeled data in the source domain and only unlabeled data in the

  • target domain.

  • Output: A common representation between source domain data and target

  • domain data and a model on the new representation for use in the target domain.



Domain Adaptation Structural Correspondence Learning (SCL) [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05]

  • Motivation: If two domains are related to each other, then there may exist

  • some “pivot” features across both domain. Pivot features are features that

  • behave in the same way for discriminative learning in both domains.

  • Main Idea: To identify correspondences among features from different

  • domains by modeling their correlations with pivot features. Non-pivot features

  • form different domains that are correlated with many of the same pivot

  • features are assumed to correspond, and they are treated similarly in a

  • discriminative learner.



SCL [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05]



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

    • Inductive Transfer Learning
    • Transductive Transfer Learning
    • Unsupervised Transfer Learning


Unsupervised Transfer Learning Feature-representation-transfer Approaches Self-taught Clustering (STC) [Dai et al. ICML-08]

  • Input: A lot of unlabeled data in a source domain and a few unlabeled data in a

  • target domain.

  • Goal: Clustering the target domain data.

  • Assumption: The source domain and target domain data share some common

  • features, which can help clustering in the target domain.

  • Main Idea: To extend the information theoretic co-clustering algorithm

  • [Dhillon et al. KDD-03] for transfer learning.



Self-taught Clustering (STC) [Dai et al. ICML-08]

  • Objective function that need to be minimized

  • where



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion



Negative Transfer

  • Most approaches to transfer learning assume transferring knowledge across domains be always positive.

  • However, in some cases, when two tasks are too dissimilar, brute-force transfer may even hurt the performance of the target task, which is called negative transfer [Rosenstein et al NIPS-05 Workshop].

  • Some researchers have studied how to measure relatedness among tasks [Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03].

  • How to design a mechanism to avoid negative transfer needs to be studied theoretically.



Outline

  • Traditional Machine Learning vs. Transfer Learning

  • Why Transfer Learning?

  • Settings of Transfer Learning

  • Approaches to Transfer Learning

  • Negative Transfer

  • Conclusion



Conclusion



Yüklə 0,55 Mb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə