![]() On this dataset, we conduct truth discovery considering spatial-temporal correlations (referred to as TD-corr), and compare with the baseline that does not consider the correlations. The whole process of collection lasts over a month. From each platform, we collect temperature forecasts at different days. We collect weather fore-cast data from three weather forecast platforms (Wunder-ground, HAM weather, and World Weather Online) for 147 locations within New York City Area. In such cases, many objects may receive observations from unreliable sources, and thus reliable information borrowed from correlated objects is important.Ī preliminary study on weather condition estimation is demonstrated in Figure 1. Especially, taking correlations into consideration will be more helpful when the coverage rate is low, i.e., sources only provide observations for a small portion of the objects. ![]() Such correlation information can greatly benefit the truth discovery process-information obtained from reliable sources can be propagated over all correlated objects, such that the aggregated information is more trustworthy. To apply truth discovery to spatial-temporal data, it is important to take into consideration the spatial-temporal relationships between objects in the truth discovery process. These and other applications demonstrate the broader impact of truth discovery on multi-source information integration. Typical examples include the integration of Web information for structured knowledge base construction and the aggregation of user-contributed information on crowdsourcing platforms. ![]() The success of truth discovery methods has been clearly demonstrated in a wide variety of tasks where decisions have to be made based on the correct information from diverse sources. In truth discovery, the following two processes are tightly coupled: The sources that provide true information more often will be assigned higher weights, and the information that is supported by reliable sources will be regarded as truths. In light of this challenge, the topic of truth discovery has gained increasing popularity recently due to its ability to estimate source reliability degrees and infer true information. To achieve this, a major challenge has to be addressed, that is, there is usually neither prior knowledge nor training data for the derivation of source reliability. Intuitively, if we can identify and put more weight on the reliable sources, the aggregation accuracy can be significantly improved. The drawback of this simple approach is obvious: It treats all the sources equally and fails to capture the variety in their reliability. One straightforward aggregation approach is to take majority voting or averaging. ![]() Therefore, we have to aggregate the information from multiple sources as this will likely cancel out the errors of individual sources and extract the true information. Facing the daunting scale of data, it is unrealistic to expect human to label or tell which piece of information is correct. Before we can make good use of the data, we need to identify the true facts among the conflicting information from multiple data sources, where a data source can be a database, a website, a sensor, or even a person. One important challenge in big data is information veracity, i.e., information sources might not be reliable. Department of Computer Science and Engineering ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |