LinkedIn © 2010, LinkedIn Corporation
LinkedIn © 2010, LinkedIn Corporation
LinkedIn © 2010, LinkedIn Corporation
string datafile = "e:\\reg.txt"; int pos; ifstream datastream; datastream.open(datafile.c_str(), ios.binary); if (!datastream.is_open()) { cout<< "Data files not opened, end of running!"; exit(0); } string myString; while(getline(datastream,myString)){ vector str; std::istringstream iss(myString); while(iss){ iss >> pos; str.push_back(pos); }
data.push_back(str); }
Projection &Knowledge & logic together make our concious
Less than a century ago heart disease was an extremely rare disease. However, today it is the cause of death of more people in the world than all other deadly diseases taken together. The most encompassing researches and studies on heart health have indicated that lack of happiness and gratification is by far the biggest risk factor resulting in heart problems. Since happiness is amongst the principal expressions of love, only those medicines that are love-based can truly and completely heal the heart and protect one from disease and ageing. If fear is the motivating factor that compels someone to go for a particular treatment or initiate major changes in life style or diet the chances of disease prevention or recovery are minimized. The current approaches for achieving heart’s health, free from any life threatening condition, are primarily symptom-orientated and do not deal with the underlying causes. Cited from [http://www.lifepositive.com/body/body-holistic/heart/heart.asp]
[Background and the focus domain] [Literature review and evaluation] [In this article, what you are proposing] [Paper structure]
The domain should be introduced in a reverse triangle way. That is, from bigger domain to smaller one, and finally point you are working on.
coming from Father tech WiMax Computer guys WiFi LTE Cellphone guys GPRS
1. System Identification = Model structure selection or define + Model parameter estimation 2. Parameter estimation is a mapping: Z----> sigma* 3. Essence of Model is its prediction ability. 4. Two criteria: Scalar norm or function of errors Data are not related to Predictive error.
they are really sensitive to initial condition and parameter values. this really haunting on me several days.
Hinfinity Filter is a minmax filter, which minimise worsts estimation. Meanwhile, Kalman minimise average estimation error.
Kalman Smoother is useful, when you want to estimate the state trajectory after obtaining the whole signal serials.
Steady State Kalman filter have constant K and P.
I saw "Introduction to linear dynamical systems" on YouTube, It is much better than what my teacher taught me, that is a little bothered me.
matrix is about linear algebra.
y=Ax 1) y is measurement, x is something we do not know. 2) y is output, x is input. 3) x is signal, y is transformation output.
aij is is weight on xi to produce yj, ith row concerns with ith output, jth column concerns with jth input. 1) aij = 0, then yj does not depend on xi 2) some aij is bigger in i row, y is more depend on xi 3) some aij is bigger in j column, xi more affect yj 4) lower triangular are zeros, yi only depends on x1....xi 5) diagnal, yi only depends on xi. 6) sparsity means relationship between x and y is not so complex.
yeah, i am here, jumped too fast!
some commands of matlab: lsqnonlin--------non-linear least square optimization lsqcurvefit------use lsqnonlin to fit a curve pem------------prediction error model idgrey----------grey box model idnlgrey--------grey box for non linear model.
ss --------------state space model sim------------model simulation bode-----------bode graph step------------step response impulse---------impulse response c2d-------------convert continuous model to discrete model
system identification method: state space poles and zeros frequency method i.e., bode graph PID control
optimization method: newton gauss-newton steepest gradient levenberg marquardt
C++: ; is important in class{}; string is in using namespace std; fabs not abs -> . :: thing
long time from c++, suffering from small bugs, yes, c++ is a bug itself. before i start, i would like to log this bugs: cout &it&it fixed &it&it setprecision(5) &it&it x yes, if x is float, and you want to output it to file or console, then you get nothing. same with textread in matlab, but you get something instead of nothing.
ok, let us start.
C++ call matlab remember, what we are using are some libs of matlab, which is mainly the "engine" lib. 1) so add includes, add lib search path. add libs themselves. 2) coding
#include " engine.h"
This is cited from USA National Academy of Engineering. - Make solar energy affordable
- Provide energy from fusion
- Develop carbon sequestration methods
- Manage the nitrogen cycle
- Provide access to clean water
- Restore and improve urban infrastructure
- Advance health informatics
- Engineer better medicines
- Reverse-engineer the brain
- Prevent nuclear terror
- Secure cyberspace
- Enhance virtual reality
- Advance personalized learning
- Engineer the tools for scientific discovery
Dynamic system follows a fixed rule, which can be formulated by: y(t+1) = A[y(t),y(t-1)...y(t-M)] + K[u(t)] So the rule and initial value determine the whole time serials.
BN is probability DAG, the node is variable, the edge is relationship of probability. DBN is the dynamic version, which means it is used for dynamic variables, does not mean it is changing dynamic. so 1) the relationship is common law never changed, but you may only use part of the common law in one time slice. for example, in HMM, the state variable is changing. In one time , you are in stateA, then you use the observing law about stateA. In another time, you are in stateB, then you use the observing law about stateB. 2) In the DAG, there are nodes coming from the past time. given you all the time slices, the DBN can be viewed as BN, which has edges from nodes of time k-m to nodes of time k. For example, in KFM and HMM, m is always 1 or 0.
HMM is a special case of DBN, in which, 1)a can-not-see variable(state variable) is discrete variable. 2)the nodes from the past of time k-1 is used only.
KFM is a special case of DBN, in which, 1) a can-not-see variable(state variable) is continuous variable. 2) the nodes from the past of time k-1 is used only.
In a way, the BN or DBN is trying to utilize the independence of variables to make the graph look simple, have few edges,also called “Sparsity”. So you can avoid too much contional rules, like P(A|B|C|D). Given you a BN, you can know who and who are independent, who and who are not, and the probability path from one variable to another.
As for undirected graphic, there are MRF, which is a special case of Ising Model. They can represent cycling, but can not represent induced dependencies.
My starting point is the logic of human doctor, i guess that is why i did not get that much out the research. When i saw the so-called "four steps" human doctors take to analyze ECG, i thought, oh, there are not so many steps, only four or five. Check the width of QRS, Check the presentation of P wave, Check the RR interval, check the ST changes.
But I should say i underestimate the work human has done, and overestimate the computer could do. And i may also underestimate the previous researchers' ability.
Why Researchers choose HMM + Wavelet, ANN, template matching, because these method is dummy way, you do not need to think too much, no logic, no rules, you just do it. That style agrees with computers.
" Ideas, and Formalization."
Whatever, i have to give up the current thought, not all, but trying to use computer implement human logic.
Damn, time is flying, so many ideas need to be verified.
In data-driven algorithm, we must use heuristic programming to simplify the analyzing procedure.
Using Template, it is simplified.
In model-driven algorithm, we must use heuristic programming to simplify the fitting procedure.
Using Template, it is induced.
It does give you a lot of application that is new. Thus make the cell phone an interface between real world and the computing world and the network world. Combination is powerful.
But the thing i saw behind is the embarrassment of the speech recognition. Because the failure of the speech recognition which make so many application based on vision-recognition. And it is the failure of vision-recognition, which make the colorful application with network to make u ignore the baby skills of pattern recognition. But the success of network is the core instead of AI.
i am so lazy, to avoid writing my own Continuous Boltzmann Machine, I spent one day to learn some basic knowledge of Python. Right now i am trying to convert it into Matlab code. Some thing really tricky is the vector & matrix multiplication, dot, outer, etc.
It is not so hard to know basic rules, but the libraries such as scipy, numpy, psyco, pylab do caught me for a whole afternoon.
The language agrees with me in its neat and natural language-like grammar. Kind of love it, but for a 30-year-old man, whether to learn it is still not fixed.
good work ------http://imonad.com
ML estimator : Choose the most probable hypothesis about parameter space, which is one of the hypothesis. Bayesian estimator : Hypothesis are used by its prior and its likelihood. The Bayesian estimation of theta are using weighted all the hypothesis(posterior), not the only one with most posterior.
Note: First time to use Formula, Input Formula in visio, export the htm, find the tif file in the htm folder. a little tricky.
I guess Geoff Hinton's work is just what i want.
Those days i am totally dispointed with the computer or program, how many fantacy things they can do, they just do it in a dummy way, when you want to analyze a simple real question like ECG waveform, it just can't be flexible to handle the different problem. The algorithms like HMM and DTW, SVM, all of those, may be complicated, may be hard to understand, but is just a dead program can't change so much. They just try to give a fixed shoes to the problem.
When doing pattern recogintion, it is the human who choose the features, who figure out what is the most discriminative feature that is useful. In a funny way, this may be called a HCI in a desperate way. You know however complex the algorithm is, It just give up the most hard work to you. And if critisize it for its starborn, it just give you a cold face.
Expect the true AI 's flexibility, I am thinking whether we can get the features by the program. After all, it is just a guessing of function of data.
Bang, I saw this video by Geoff Hinton, such a good idea. By the way, i have the same feeling with the joke he played on SVM. Yes, it is just a perceptron, nothing more.
I have deeply belief in ANN, which has a solid ground of simulating the biology. If there is no God, there is hope, i mean in ANN. And if there is God, there is definitely hope, or hopeless for all sciences.
’The Next Generation of Neural Networks‘
Preprocessing 1) DWT 2) High, Low, Bandpass, Moving Average Filtering 3) EMD 4) Adaptive Filtering
QRS detection ( 30years history) 1) Wavelet based Auto-Threshold Method 2) Differential Operation Method 3) MART method 4) Template Matching 5) Automata Method 6) Cross-zeros Method
ECG Segmetation 1) HMM 2) Fiducial Point Finding
ECG classification 1) Wavelet as Features 2) DCT as features 3) ECG waveform PQRST as features 4) DTW 5) ANN 6) Fuzzy & NeuroFuzzy 7) Clustering
Since the ECG is non-stationary waveform, maybe we can view it as a subsection function. Then the matching or evaluating method will be proposed.
The advantages of the method should be: 1) easy to get the fiducial point, 2) so the analysis can be fulfilled in all of the aspects.
To quantify the degree to which the two variables, X and Y, vary together (covary). It is often important when analyzing x+y or x-y.It is often shows different information compared with its correspondence, variance. It shows whether two variables are related, besides whether the variables change strongly or not( random or not).When covariance is not zero, two variables are related.
In unsupervised learning, what is lost is the label of each data compared with supervised learning. Which is considered as the missing parameters. This is equal to the problem which EM algorithm trying to solve.
So we first estimate the probability of Labels -> E-Step Then using this Labels, to compute the ML of parameters -> M-Step
I am trying to use HMM in ECG processing, which is very popular right now, so i get the " A Tutorial on Hidden Markov Modes and Selected Applications in Speech Recognition" .
Then i got some open source on the websites.
Then what i need is to train it, even i have read the ML training, i still think i should give training data with observations and its responsible states, i.e. (O, S) pair. That is why i try to use matlab UI to build a software to let me annotate the ECG waveform to give the (O, S) pair.
After I ve done that, i find out" There are two training method", the supervised one and unsupervised one. The first one just use the frequency to replace the probabilities. Well the second one just try to improve the parameters to make observations more likely to happen.
So it is not smart to do things too fast, right?
Root is screen, Figure is part of screen, Axes is part of Figure, where you can draw image, line, rectangle, surface, and text objects.
1) Axes is plural of axis 2) Axes, Menu, Uiobjects are children of Figure; 3) Root can not be create and destroyed.
some functions used first time 1) handles -> used for transfer data between sub-functions.
2) set(hObject, 'toolbar','figure') -> add a toolbar for figure control such as zoom
3) guidata(hObject, handles) -> save changes to guidata stored in handles, if u do not do this after you modifying the handles, it will be reset after you exit the sub-function
4) dcmobject = datacursormode(gcf) -> get info of data cursor setups, you can use
5) getCursorInfo(dcm_obj) to get position of current cursor, useful for me!
6) uigetfile -> open a dialog to let you select a file, and return its file name and path
7) uigetdir -> open a dialog to let you choose a directory.
8) ginput -> let you choose a point in the figure using cross mouse pointer.
9) gtext -> let you put a text on the figure using cross mouse pointer.
It is embarrassing to never use a external toolbox, addpath(genpath('c:/data'));--ADDPATH DIRNAME prepends the specified directory to the current savepath(genpath('c:/data'));-------SAVEPATH saves the current MATLABPATH in the pathdef.m, which was read on startup.
genpath('c:/data') return a path string, plus, recursively, all the subdirectories of D, including empty subdirectories.
when i use this code in Matlab,
text( x, y, 'N', 'Color', 'r');
i get an annoying error
??? Error using ==> text Invalid parameter/value pair arguments.
finally i found answer on the web, it is because x is a integer, it must be double. then i convert x into double class, then it is ok,
damn machines and thank google
It is maybe the simplest method for classification, but still with some tricky problem. We can make it powerful and robust by making the template a model instead of a samples set. 1) in DTW, the sample time is a parameter of the model, which can be changed. It has brought great improvement to the effect, and was taken by the Speech Recognition group at the first few years. 2) for spike interference, we can a) use some orthogonality. b) split the template into pieces, d) check the sigma = sum(T-x)^2 . e) make the template a model with its parameter changeable and select the parameter to minimized the sigma, which is the most powerful method. 3) In some way, we can view the HMM as a template model, which can generate a lot of templates of different class.
note : thanks for Akash Kansagra
one of my idea: symbol DTW,feature match as amplitude to use DTW
Excuse me, another way to say [utilize] is [take ... into bearing]? help me. [No, It is "Bring ... to bear"]. OK, let us go back to work.
When we analyze uncertain things, we usually do not use the routine of getting the features first and then analyzing them, because that is hard, that is kind of clustering, when the dimention is more than 2, no human can do a good job. So we always use the routine of getting some features first, coming out with some proposition, and looking for evidence to prove our proposition is right or wrong. And in that routine features used to get the proposition is a start point with some probability. The evidence is Context into bearing.
Well it is intuitive, but it is hard to use in AI, Actually we are not thinking so accurately, so we find it is hard to define the amount of information of evidence, the relationship of evidence and the indepence of all the evidence. And the start point come to our head randomly, which make it a little hard to be copied.
HMM, Bayesian, Fuzzy logic, Evidence Theory, Possibility Theory and all the Fusion Theory and Filter Theory are used to handle this problem, HMM is a simplification of Bayesian. And Bayesian seem to lie in the theory and mathematical realm. Filter Theory ignores the give the relationship a fixed outline in time view, gives a obscure model. Fuzzy logic, Evidence Theory and Possibility Theory are all trying to decline the responsibility by just simulating the human. Fusion is just another way to look at all this things.
In the hello-world example, we are trying to say how possible the rain will stop Peter to go shopping, some say 0.8. some say 0.5. All this stuff make yourself not so confident and seriously doubt the correctness of your decision.
I guess that is why we are trying to train our guessing machines. and that is usually a task of ANN, which in its birth getting the knowledge from data, which is similar to humans. Finnaly this leads to the so-called research of Maching Learning, which is trying to extend the training idea, and trying to not so human-like.
Definition: Using line segments to approximate one curve. No doubt it will compress data and simplify the shape analysis.
Methods 1) Heuristic Search point by point, until the error boundary has been exceeded. It just gives out the line segments which does not exceed the Error Threshold. 2) Optimized Use a shortest path theory to solve the problem. The path length will be the errors, and the path is the selected vertex, then use a optimized method to search the best path. It gives out the line segments Minimizing the total error.
Problem I met when i used it: 1) Noise when there is noise in signal , most of the time, the algorithm can get the mean line of the noise, which can be seen as a denoising effect. But some time it just takes some big noise peaks as the true line of the signal. some conditions to identify this: a) noise in context as reference, b) at least including one up and down shapes, c) errors is zeros, d) length is short but not so short, e) slope is high. 2) computational burden the heuristic method is fast, but can give a -peasudo-best results, and can not give the corners good thinking. the optimized method need too much searching loops.
EMD was proposed by Norden E. Huang, and used for non-stationary and non-linear signal analysis.
Intuitively if we connect all the maxima of one signal, we will get its envelope, the same is true for minima. then signal can be decomposed into the mean of the two envelopes and the residue. If we do this operation continually, then we will finally get a signal without riding waves(baseline is zero, i.e. mean of envelopes will be zero).
1. EMD is a data driven algorithm, it decomposes the signal into IMF, which is not a pre-defined signal. IMF is generated by data itself.
2. The idea is : the signal is decomposed into local oscillations and its residues(can be viewed as a baseline) in some scale(it is not fixed). The osillations is high frequency part resides on the low frequency residues. If we do the same decomposition on the residues, then we can get another osillations with wider scale.
3. The algorithm is based on finding oscillations using local extrema. 1) identify all extrema of x(t) 2) interpolate between minima(resp. maxima), ending up with some envelope e_min(resp. e_max) 3) compute the mean m(t) = (e_min+e_max)/2 4) extract the detail d(t) = x(t)-m(t) 5)Iterate steps 1)-4) on the residual until the detail signal d(t) can be considered an IMF: c(t)=d(t). 6) iterate 1)-5) on the residual m(t) to extract all the IMFs. The stop criteria will be: the signal is constant or monotonic, or with only one extrema.
Definition of IMF: Functions with equal number of extrema and zero-crossings, and must be zero-mean defined by its maximas and minimas. This is for consideration of removing riding wave or baseline, and it has a good Hilbert Transform results, that is it is an good AM-FM signal in the form of a(t)exp( i*theta(t) ), and for IMF, its envelope defined by HT will make sense.
Intermittent Test: To select time scale(shorter than this will not be included) to avoid oscillations with different time scale(frequency) mixed with each other into same IMF.
How to use EMD: 1. EMD of signal 2. Spectral Analysis of IMFs 3. Selection of IMFs based on Spectral characteristic. 4. Reconstruction or partial reconstruction of signal for analysis.
it is random for me to check this, but i hope its impact on me, some people and future is non-trivial. Since i am working on some pattern recognition and expert system, I guess this will be a good start and can record this procedure and maybe some day it will become my homepage.
|
|
|