تعداد اسلایدهای پاورپوينت: 33 اسلاید این یک پاورپوینت تخصصی رشته مهندسی نرم افزار برای مقطع کارشناسی ارشد می باشد. این پاورپوینت در کلاس دکتر کیوان پور ارائه شده است، در این پاور به مباحث Data Stream و الگوریتم های مبتنی بر آن پرداخته شده است، این الگوریتم ها به دو دسته Clustering & Classification پرداخته و ۴ الگوریتم VFKM, K-Means, Emsemble و … بیان شده است. این پاورپوینت به زبان اصلی یعنی انگلیسی می باشد.

yeganeh

صفحه 1:
Data Stream Mining

صفحه 2:
What is Data Stream? * Transient, Continuously, increasing sequence of Data

صفحه 3:
Data stream * Google Searches * Credit Card Transaction * Sensor Network

صفحه 4:
Data stream Characteristics * Infinite Volume * Chronological Order * Dynamic Changes

صفحه 5:
Knowledge Single pass r Selecting some parts of data steam Preprocessing of data streams Incremental learning Knowledge extraction Data Stream Mining Data stream generators Sensor networks Satellites Internet traffic Call records

صفحه 6:
Mining Traditional Data Stream Number of Paases angle Multiple Time Unlimited Real-time Memory Unlimited Bounded Concepts One Multiple 5 Accurate | Approximate

صفحه 7:
Data Stream Mining Algorithm Mining Task » Decision Tree Based on Classes Weight Rule Based > Combination of Different Classifier e K-Means e Micro Clustering Approach * Density Base Clustering e Prediction Algorithm VFDT & CVFDT LWClass SCALLOP » Ensemble-Based e VFKM e CluStream e D-Stream e AWSOM Classi ficatio n

صفحه 8:
Classificati on

صفحه 9:
Concept Drift * Changes in discovered pattern over time =e > ‏هم‎ ‏ل رت‎ SE Se Se SSS Ses “Sree ee ‏سح‎ Old Data New Data

صفحه 10:

صفحه 11:
* Incremental Learning vs. Batch Mode * Very Fast Decision Tree

صفحه 12:
Challenges * How to ‘forget’ old samples ? 5 2

صفحه 13:
New Data

صفحه 14:
Data Expiration Problem optimum boundary :— positive: @ overfitting:-- negative: O 0 (a) Se,arrived (b) S1,arrived (c) ‏له ردق‎ during [te,t:) during [t:,te) during [fz,ts) Overfitting!

صفحه 15:
Data Expiration Problem optimum boundary :— Conflicting Concepts!

صفحه 16:
۱61۱00 * Stream is partitioned into sequential chunks Train a classifier from each chunk Assume y is a test example fc (y) => probability of y being an instance of class The probality output of faye ensemble is given by: ko 9 7 ‏رد33 روت‎ ه١‎ * *

صفحه 17:
Accuracy-Weighted Ensemble * Ek => an Ensemble with k classifiers * Gk => a single classifier learned from k last chunks * Ek produces smaller classification error than Gk

صفحه 18:
Accuracy-Weighted Ensemble Divide the data stream into k data chunks 501 Sn Sn is the most recent chunk For a record (x,c) in Sn fc~i (x) => probability given by Ci that x is an instance of C Thus [] 1-f[]c7i (x) is the error of Ci +

صفحه 19:
Accuracy-Weighted Ensemble * Assign ۷ _ classifiers based on expected prediction accuracy * Only top k classifiers is kept = MSE, — MSE; MSE, = ( p(c)(1 — p(0))? ‏نت۱ وم اه‎ 2 ,65

صفحه 20:
وصاءع ونان / 1 x CLUSTERING

صفحه 21:

صفحه 22:
5000 15000 7000 25000 100000 14 10 15 30

صفحه 23:
۳-5 Clustering

صفحه 24:

صفحه 25:
Height Length

صفحه 26:

صفحه 27:
VFKM * consists of a number of runs and each run contains a number of iterations * uses only a calculated number of all the available data items * uses only a particular number of data items in each step i

صفحه 28:
NEW Data Fd EE ‏كل‎ error probability

صفحه 29:
1۲ )) ۲۱05۲ ۶* < ‏اع‎ ( && (least 1-5* [2] > 5i)) { print( “The END, \(Ei), \(6i)”) }

صفحه 30:
0 5 ‏و۰ 1 0 3 ما‎ Dat 0 ‏و‎ » Data oO : ۰ 1 = is ۰ 2 ۱ 0 9 C decrease in memory can potentially lead to an exe Decrease

صفحه 31:

صفحه 32:
AOG * Algorithm output granularity is a generic, resource-aware mining data stream approach _ that focuses on adapting the algorithm's performance according to the data rate and available memory.

صفحه 33:
“Thank you for your attention.”

جهت مطالعه ادامه متن، فایل را دریافت نمایید.
16,000 تومان