([WUDFWLRQRI+LJKO\8WLOL]HG,WHPVHWVIURP
ODUJH7UDQVDFWLRQDO'DWDEDVHV
.0DGKDYL
Department of Computer Science and Technology
VR Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India
&+1DQGD.ULVKQD
Department of Computer Science and Technology
VR Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India
6DL+DUVKLWD*XWWD
Department of Computer Science and Technology
VR Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India
$EVWUDFW7KHPDLQDLPRIWKLVSURMHFWLVWRJHQHUDWHWKHKLJKXWLOLW\SURILWDEOHLWHPVHWVLQDSDUDOOHOHQYLURQPHQW E\FRQVLGHULQJWKHTXDQWLW\RIHDFKLWHP7KHSDUDOOHOPLQLQJRIKLJKXWLOLW\LWHPVHWVZLOOWDNHYHU\OHVVWLPHWKDQ PLQLQJ ZLWK WKH VLQJOH V\VWHP RYHU ODUJH QXPEHU RI WUDQVDFWLRQV $ 8WLOLW\SDWWHUQ WUHH83BWUHH GDWD VWUXFWXUH 0HPRU\5HVLGHQWLVXVHGWRVWRUHWKHLQIRUPDWLRQDERXWWKHWUDQVDFWLRQV%\83B*URZWKDOJRULWKPWKHFDQGLGDWH LWHPVHWVDUHJHQHUDWHGE\VFDQQLQJWKHGDWDEDVHRQO\WZLFH7KHDERYH8WLOLW\3DWWHUQWUHHLVGXSOLFDWHGDWHYHU\QRGH LQWKHSDUDOOHOV\VWHPDQGDWHDFKQRGHWKHFRQGLWLRQDOSDWWHUQEDVHWUHHLVFRQVWUXFWHGIRUWKHDVVLJQHGKLJKXWLOLW\ LWHPVHWV IURP WKLV WUHH PLQHG WKH KLJK XWLOLW\ LWHPVHWV DFFRUGLQJ WR WKH JLYHQ PLQLPXP XWLOLW\ WKUHVKROG 7KH SHUIRUPDQFHRI83B*URZWKDOJRULWKPLVREVHUYHGE\DSSO\LQJRQWKHGDWDVHW
.H\ZRUGV8WLOLW\0LQLQJ4XDQWLWDWLYH'DWDEDVH
,,1752'8&7,21
$VVRFLDWLRQUXOHVPLQLQJ$50>@LVRQHRIWKHPRVWZLGHO\XVHGWHFKQLTXHVLQGDWDPLQLQJDQGNQRZOHGJH GLVFRYHU\ DQG KDV PDQ\ DSSOLFDWLRQV OLNH EXVLQHVV PHGLFLQH DQG RWKHU GRPDLQV 0DNH WKH GHFLVLRQV DERXW PDUNHWLQJDFWLYLWLHVVXFKDVHJLQVXSHUPDUNHWV
'DWD PLQLQJ LV WKH H[WUDFWLRQ RI SRWHQWLDOO\ XVHIXO LQIRUPDWLRQ IURP WKH DYDLODEOH GDWD 'LVFRYHU\ RI WKH NQRZOHGJH LV WKH GDWD PLQLQJ JRDO ZKLFK LV XVHG WR SUHGLFW WKH IHDWXUH EHKDYLRU IRU WDNLQJ WKH VWUXFWXUDO GHFLVLRQV0LQLQJWKHIUHTXHQWSDWWHUQVLVDIRUPHUUHVHDUFKWRSLFLQGDWDPLQLQJ)UHTXHQWSDWWHUQPLQLQJLVWKH ILQGLQJRILQWHUHVWLQJSDWWHUQVKLGGHQLQDGDWDEDVH)UHTXHQWSDWWHUQPLQLQJGRHVQ¶WFRQVLGHUWKHSURILWIRUHDFK LWHP 7R RYHUFRPH WKLV SUREOHP ZHLJKWHG IUHTXHQW SDWWHUQ PLQLQJ >@ LV SURSRVHG :KHUH DV LQ ZHLJKWHG IUHTXHQWSDWWHUQPLQLQJGRHVQ¶WFRQVLGHUWKHTXDQWLW\RIHDFKLWHP7RRYHUFRPHWKLVSUREOHPXWLOLW\PLQLQJLV SURSRVHG 8WLOLW\ PLQLQJ FRQVLGHU WKH ERWK SURILW DQG TXDQWLW\ RI HDFK LWHP 8WLOLW\ PLQLQJ VXSSRUWV WKH TXDQWLWDWLYHGDWDEDVH
8WLOLW\ PLQLQJ ILQGV WKH PRVW YDOXDEOH IUHTXHQW LWHPVHWV 7KH 8WLOLW\ RI DQ LWHP UHIHUV WR WKH XVHUV LQWHUHVWLQJQHVV IRU LWHP 7KH PXOWLSOH RI LWHPVHWV H[WHUQDO DQG LQWHUQDO XWLOLWLHV GHILQHV LW¶V XWLOLW\ ([WHUQDO XWLOLW\LVWKHSURILWRIWKHLWHPLQWHUQDOXWLOLW\LVTXDQWLW\RIWKHLWHPSUHVHQWLQWKHWUDQVDFWLRQ)RUWKHJLYHQ XWLOLW\ WKUHVKROG LI WKH GHWHUPLQHG LWHPVHW XWLOLW\ LV JUHDWHU WKDQ WKH JLYHQ XWLOLW\ WKUHVKROG WKHQ LW LV FDOOHG SURPLVLQJLWHPVHWHOVHLWLVFDOOHGXQSURPLVLQJLWHPVHW7KHSURFHVVRIPLQLQJKLJKXWLOLW\LWHPVHWVUHTXLUHVWZR LQSXWVILUVWRQHLVWUDQVDFWLRQDOGDWDEDVHDQGVHFRQGRQHLVSURILWIRUHDFKLWHP
,,%$&.*5281'
'HILQLWLRQ,WHPVHW;XWLOLW\LQ
''DWD%DVHLVUHSUHVHQWHGE\
'HILQLWLRQ)RUWKH'GDWDEDVHWKHLWHPVHW;7UDQVDFWLRQZHLJKWHGXWLOLW\LV
7:8;
:KHQPLQLQJWKHKLJKXWLOLW\LWPVHWVIURPYHU\ODUJHWUDQVDFWLRQVWDNHVPXFKWLPHWKLVPLQLQJWLPHFDQEH UHGXFHGE\LPSOHPHQWLQJLQDSDUDOOHOHQYLURQPHQW
,,,5(/$7(':25.
)RUPLQLQJWKHIUHTXHQWLWHPVHWVEDVLFDOO\$SULRUL$OJRULWKP>@LVXVHG7KHGLVDGYDQWDJHRIWKLVDOJRULWKPLVLW UHTXLUHV WRR PDQ\ VFDQV RI WKH HQWLUH GDWDEDVH DQG LW JHQHUDWHV WHVW WKH HDFK FDQGLGDWH LWHPVHW NQRZLQJ IRU SURPLVLQJ RU XQSURPLVLQJ LWHPVHW 7R RYHUFRPH WKLV )UHTXHQW SDWWHUQ JURZWK )3B*URZWK>@ DOJRULWKP LV SURSRVHG+LJKXWLOLW\LWHPVHWVDUHPLQHGE\8SB*URZWKDOJRULWKPDQGLWLVDWUHHEDVHGDSSURDFKDVVDPHDV )3*URZWK>@DOJRULWKP8SB*URZWKDOJRULWKPUHTXULHVRQO\WZRVFDQVRIGDWDEDVH>@IRUFUHDWLQJWKH8SBWUHH IURPWKLVWKHODUJHXWLOLW\LWHPVHWVDUHPLQHG)RUPLQLQJWKHLQWHUHVWLQJLWPHVHWVILUVW8SBWUHHLVFUHDWHGWKHQWKH FRQGLWLRQDOSDWWHUQEDVH&3%WUHHVDUHGHULYHGIRUHDFKKLJKXWLOLW\LWHPDVVLJQHGWRWKHFRUUHVSRQGLQJHDFK SDUDOOHOQRGH)URPHDFKSDUDOOHOQRGHWKHFRUUHVSRQGLQJKLJKXWLOLW\LWHPVDUHPLQHGE\XVLQJWKH8SB*URZWK DOJRULWKP
)RUFRQVWULFWLQJWKH8SB7UHHWKHQRGHVWUXFWXUHLV
&ODVV1 ^
6WULQJQDPH
LQWFRXQW
LQWXWLOLW\
6WULQJSDUHQW
&ODVV1KOLQN
`
7KHOLQNSRLQWHULVXVHGWRUHSUHVHQWWKHODVWDSSHDUDQFHRIWKHQRGHLQ83B7UHHZKLFKKDVVDPHQRGHQDPHLQ WKHHQWU\KHDGHUWDEOH
6WHSVIRU&RQVWUXFWLRQRI83B7UHH
'DWDEDVHLVVFDQQHGRQO\WZLFHIRU&RQVWUXFWLRQRI83B7UHH
)LUVWVFDQ
)RUHDFKWUDQVDFWLRQWKHWUDQVDFWLRQXWLOLW\7XLVFRPSXWHG
$WWKHVDPHWLPHWKHWUDQVDFWLRQZHLJKWHGXWLOLW\7:8RIHDFKLWHPLVDOVRFRPSXWHG
7KHLWHPVZLWK7:8¶VOHVVWKDQWKHJLYHQWKUHVKROGWKHQWKRVHLWHPVDUHFDOOHGXQSURPLVLQJLWHPVDQG WKRVHLWHPVDUHGLVFDUGHG
)RUWKHSURPLVVLQJLWHPVFRPSXWHWKHUHWUDVDFWLRQXWLOLWLHV
7KHSURPLVLQJLWHPVDUHDUUDQJHGLQWKHGHFUHDVLQJRUGHURIWKHLU7:8¶V
6HFRQGVFDQ
(DFKWUDQVDFWLRQLVLQVHUWHGDVDEUDQFKLQWRD83B7UHH
$IWHUFRQVWUXFWLRQRI83B7UHHWKHIROORZLQJVWUDWHJLHVDUHXVHGWRUHGXFHWKHWLPHDQGWKHVHDUFKVSDFH7KH VWUDWHJLHVDUHDVIROORZV
6WUDWHJ\'LVFDUGLQJJOREDOXQSURPLVLQJLWHPV'*8
6LQFHWKHXQSURPLVLQJLWHPVGRHVQRWLQYROYHGLQWKHKLJKXWLOLW\LWHPVHWVWKHVHDUHQRWFRQVLGHULQ WKHFUHDWLRQRI*OREDO83BWUHH
6WUDWHJ\'LVFDUGLQJJOREDOQRGHXWLOLWLHV'*1
,QWKHFRQVWUXFWLRQRID*OREDO8WLOLW\SDWWHUQWUHHIRUDQRGHWKHVXFFHVVRUXWLOLWLHVDUHGHOHWHGIURPWKH FXUUHQWXWLOLW\RIWKHQRGH
6WUDWHJ\'LVFDUGLQJORFDOXQSURPLVLQJLWHPV'/8
,QWKHFRQVWUXFWLRQRIDORFDO8WLOLW\SDWWHUQWUHHWKHXQSURPLVLQJLWHPVPLQLPXPXWLOLWLHVDUHGURSSHGIURP SDWKXWLOLWLHVRISDWKV
6WUDWHJ\'HFUHDVHWKHORFDOQRGHXWLOLWLHV'/1
)RUFRQVWUXFWLRQRIDORFDO83BWUHHIRUDQ\QRGHWKHLWHP¶VPLQLPXPXWLOLWLHVRIGHVFHQGDQW¶VDUHGHFUHDVHG
7KH SVHXGRFRGHIRU83*URZWKLVDVIROORZV
0HWKRG83*URZ$[%[=
,QSXW$[DV83B7UHH%[KHDGHUWDEOHIRU$[DQG=LWHPVHW
2XWSXW3+8,¶VLQJLYHQWUHH
0HWKRG83*URZ$[%[=
)RUHYHU\FLLQ%[GR
&UHDWH3+8,3 =
$VVLJQ3¶VXWLOLW\ XWLOLW\RIFL¶VLQ%[
&UHDWH3¶V&3%
3ODFHORFDOLWHPVZKLFKDUHKDYLQJJUHDWHUPLQBXWLOLQ3&3%LQWR%\
,PSOHPHQW'/8
,PSOHPHQW'/1DGGEUDQFKHVWR$\
,I$\ ^`FDOO83*URZ$\%\3
(QGRIIRU
8WLOLW\ 3DWWHUQ 7UHH LV XVHG IRU PDLQWDLQLQJ WKH LQIRUPDWLRQ DERXW WKH WUDQVDFWLRQV )URP WKH 83B7UHH WKH 3+8,¶V DUH JHQHUDWHG 83B7UHH ZLOO EH UHSOLFDWHG DW HDFK SDUDOOHO QRGH DQG WKH FRUUHVSRQGLQJ WKH &3% &RQGLWLRQDO3DWWHUQ%DVHWUHHVDUHJHQHUDWHGDWHDFKQRGHIRUDOOWKHLWHPVLQWKHKHDGHUWDEOH)RUHDFKLWHPV &3%¶VWKHKLJKXWLOLW\LWHPVVHWVDUHPLQHG
7KH83B7UHHFRQVWUXFWHGDVIROORZV
,9352326('6<67(0
,QWKHSURSRVHGV\VWHPWKHPLQLQJRIKLJKO\XWLOL]HGLWHPVHWVZLOOGRQHLQSDUDOOHO+HUHWKHFRQVWUXFWLRQRI 8SBWUHH>@ ZLOO GRQH DW VHUYHU DQG WKH KLJKO\ XWLOL]HG LWHPVHWV DUH PLQHG DW HDFK SDUDOOHO QRGH IURP LWV FRQGLWLRQDO SDWWHUQ EDVH WUHH¶V7KH VWUXFWXUH RI WKH SURSRVHG VWUDWHJLHV LQFOXGHV WKUHH SKDVHV 6FDQ WKH GDWDEDVHWZLFHWRGHYHORSDZRUOGZLGH7UHHZLWKWKHXQGHUO\LQJWZRWHFKQLTXHVUHFXUVLYHO\FUHDWH3+8,V IURPZRUOGZLGH837UHH>@DQGFORVHE\837UHHVE\83*URZWKZLWKWKHWKLUGDQGIRXUWKSKLORVRSKLHVRUE\ 83*URZWKZLWKWKHODVWWZRIUDPHZRUNVDQGUHFRJQL]HUHDOKLJKXWLOLW\LWHPVHWVIURPWKHSODQRI3+8,V
4.1 The Proposed Data Structure: UP-Tree
7RIDFLOLWDWHWKHPLQLQJSHUIRUPDQFHDQGDYRLGVFDQQLQJRULJLQDOGDWDEDVHUHSHDWHGO\ZHXVHDFRPSDFWWUHH VWUXFWXUHQDPHG837UHHWRPDLQWDLQWKHLQIRUPDWLRQRIWUDQVDFWLRQVDQGKLJKXWLOLW\LWHPVHWV7ZRVWUDWHJLHV DUHDSSOLHGWRPLQLPL]HWKHRYHUHVWLPDWHGXWLOLWLHVVWRUHGLQWKHQRGHVRIJOREDO837UHH
4.1.1 The Proposed Mining Method: UP-Growth
$IWHU FRQVWUXFWLQJ D JOREDO 837UHH D JHQHUDO PHWKRG IRU JHQHUDWLQJ 3RWHQWLDO\+8,V LV WR PLQH 837UHH E\ )3 *URZWK >@ 6R PDQ\ FDQGLGDWH LWHPVHWV ZLOO EH JHQHUDWHG 7KXV ZH XVH DQRWKHU DOJRULWKP 83*URZWK E\ LPSOHPHQWLQJ WZR PRUH VWUDWHJLHV LQWR WKH IUDPHZRUN RI )3*URZWK %\ WKH VWUDWHJLHV RYHUHVWLPDWHG XWLOLWLHV RI LWHPVHWVFDQEHGHFUHDVHGDQGWKXVWKHQXPEHURI3+8,VFDQEHIXUWKHUUHGXFHG
)LJXUH$UFKLWHFWXUHRI83*URZWK
4.2 An Improved Mining Method: UP-Growth+
83*URZWK DFFRPSOLVKHV OHVV WLPH IRU H[HFXWLRQ RYHU )3*URZWK E\ XWLOL]LQJ '/8 DQG '/1 WR GLPLQLVK RYHUHVWLPDWHGXWLOLWLHVRILWHPVHWV,QDQ\FDVHWKHRYHUHVWLPDWHGXWLOLWLHVFDQEHQHDUHUWRWKHLUJHQXLQHXWLOLWLHV E\GLVSRVLQJRIWKHDVVHVVHGXWLOLWLHVWKDWDUHQHDUHUWRUHDOXWLOLWLHVRIXQSURPLVLQJWKLQJVDQGUHODWLYHKXEV,Q WKLVSDSHUZHSURSRVHDQRWKHULPSRUWDQWVWUDWHJ\QDPHG83*URZWKIRUOHVVHQLQJRYHUHYDOXDWHGXWLOLWLHVDOO WKHPRUHVXFFHVVIXOO\
,Q 83*URZWK OHDVW WKLQJ XWLOLW\ WDEOH LV XWLOL]HG WR GHFUHDVH WKH RYHUHVWLPDWHG XWLOLWLHV ,Q 83*URZWK LQVLJQLILFDQW KXE XWLOLWLHV LQ HYHU\ ZD\ DUH XWLOL]HG WR PDNH WKH HYDOXDWHG SUXQLQJ YDOXHV QHDUHU WR JHQXLQH XWLOLW\HVWLPDWLRQVRIWKHSUXQHGWKLQJVLQGDWDEDVH
9(;3(5,0(17$/(92/87,216
7KHSHUIRUPDQFHRIPLQLQJKLJKXWLOLW\LWHPVHWVLVUDSLGO\LQFUHDVHGLQWKHSDUDOOHOHQYLURQPHQWE\PLQLPL]LQJ WKH PHPRU\ VSDFH DQG WKH JHQHUDWLRQ RI XQZDQWHG FDQGLGDWH LWHPVHWV 7KH QHZ PHWKRGV ZLOO JLYH D EDWWHU SHUIRUPDQFHIRUWKHODUJH'DWDEDVHVDQGIRUWKHORZXWLOLW\WKUHVKROGV%\WKHEHORZH[SHULPHQWDOUHVXOWVWKH ,PSURYHG83*URZWKLVHIILFLHQWO\UHGXFLQJWKHH[HFXWLRQWLPH+HUHFRPSDUHWKHSHUIRUPDQFHRI83B*URZWK DQG83B*URZWKSOXVRQGDWDVHWV>@7DEOHVKRZVWKHH[HFXWLRQWLPHVRQYDULRXVPLQBVXSYDOXHVIURPWR
9,&21&/86,21
7KLV SDSHU SURSRVHV D SDUDOOHO V\VWHP ZKLFK WDNHV YHU\ OHVV WXUQDURXQG WLPH IRU PLQLQJ ODUJH XWLOLW\ LWHPVHWV&RQYHUWLQJWKHWUDQVDFWLRQVLQWR83BWUHHWZRVFDQVDUHVXIILFLHQW&RQGLWLRQDOSDWWHUQEDVHGWUHVVIRU HDFKKLJKXWLOLW\LWHPVHWLVUHSOLFDWHGDWHDFKSDUDOOHOQRGH)URPDOOWKHSDUDOOHOQRGHVZHJHWWKHKLJKXWLOLW\ LWHPVHWV:LWKWKLVZHFDQDOVRFUHDWHDODUJHV\QWKHWLFGDWDVHWV
5()(5(1&(6
>@ &+&DL$:&)X&+&KHQJDQG::.ZRQJ³0LQLQJ$VVRFLDWLRQ5XOHVZLWK:HLJKWHG,WHPV´
>@ &)$KPHG6.7DQEHHU%6-HRQJDQG<./HH³(IILFLHQW7UHH6WUXFWXUHVIRU+LJK8WLOLW\3DWWHUQ0LQLQJLQ,QFUHPHQWDO 'DWDEDVHV´
>@ $JUDZDODQG6ULNDQW³)DVW$OJRULWKPVIRU0LQLQJ$VVRFLDWLRQ5XOHV´
>@ ³$7ZR3KDVH$OJRULWKPIRU)DVW'LVFRYHU\RI+LJK8WLOLW\,WHPVHWV´<LQJ/LX:HLNHQJ/LDRDQG$ORN&KRXGKDU\ >@ 9LQFHQW67VHQJ&::X%(6KLHDQG36<X83*URZWK$Q(IILFLHQW$OJRULWKPIRU+LJK8WLOLW\,WHPVHW0LQLQJ >@ )UHTXHQWLWHPVHWPLQLQJLPSOHPHQWDWLRQVUHSRVLWRU\KWWSILPLFVKHOVLQNLIL
>@ <&/L-6<HKDQG&&&KDQJ,VRODWHGLWHPVGLVFDUGLQJVWUDWHJ\IRUGLVFRYHULQJKLJKXWLOLW\LWHPVHWV.In Data & Knowledge Engineering, Vol. 64, Issue 1, pp. 198-217, Jan., 2008.
>@ 5&KDQ4<DQJDQG<6KHQ0LQLQJKLJKXWLOLW\LWHPVHWVIn Proc. of Third IEEE Int'l Conf. on DataMining, pp. 19-26, 2003. >@ -LDZHL+DQ-LDQ3HLDQG<<LQ0LQLQJIUHTXHQWSDWWHUQVZLWKRXWFDQGLGDWHJHQHUDWLRQIn Proc. of theACM-SIGMOD Int'l Conf.
on Management of Data, pp. 1-12, 2000.
>@ +<DR+-+DPLOWRQDQG/*HQJ$XQLILHGIUDPHZRUNIRUXWLOLW\EDVHGPHDVXUHVIRUPLQLQJLWHPVHWVIn Proc. of ACM SIGKDD 2nd Workshop onUtility-Based Data Mining, pp. 28-37, USA