Science topics: Physiology Biosignals Biosignal Processing Statistical Analysis Data Analysis Dataset

Science topic

Dataset - Science topic

Explore the latest questions and answers in Dataset, and find Dataset experts.

… 34

Shagufta Naz

asked a question related to Dataset

Maximum Likelihood Estimation of Parameters issue ?

Question

6 answers

Apr 22, 2024

I have data set consist of x and y variable where I wan to perform maximum likelihood estimation (MLE) to fit a mean function and astandard deviation function o the data. I am estimating the the beta and alpha parameters using Maximum likelihood function in order to observe the mean and sigma trend in my data. And finally performing the optimization of the likelihood. Every time I encounter the issue of obtaining different values for the mean and sigma when passing different initial values, I find that my model becomes sensitive to the initial parameters. Is there any way to address this situation so that I can obtain the best values for the mean and sigma that automatically fit into my model?

P.S. I have applied different optimisation methods also but didn't work.

i have achieved required results by implementing it through other techniques but I want to implement it using MLE. How can I cope this issue ?

Relevant answer

Emmanuel Curis

May 10, 2024

Answer

By sigma trend, do you mean you are also fitting the standard deviation of the error term as a function of X? If so, could you try first with fixed variance for the error term: models with non-fixed variance may be more difficult to fit.

You may also check that your likelihood is realistic. For instance, sigmoid models are often used to data contrained between 0 and 1 (to a rescaling eventually), which is not compatible for a Gaussian random variable. So this may interfere with the MLE fit.

Last, if initial value has a significant impact, and if you really think you are not overfitting the data (one of the most common cause for unstability), you may use a grid search on the initial values to find the global minimum... You may also explore the shape of the MLE obtained as a function of the initial values to see if something appears...

Good luck,

View

0 Recommendations

Yiming Chen

asked a question related to Dataset

What kind of data sets are good?

Question

3 answers

May 7, 2024

I have a question that I would like to ask, for a data-driven task (for example, based on machine learning, etc.), what kind of data set is the advantage data set? Is there a qualitative or quantitative way to describe the quality of the data set?

Relevant answer

Kazi Redwan

May 9, 2024

Answer

The "advantageous" dataset for a data-driven task is one that is relevant, sufficiently large, high-quality, representative, balanced, temporally consistent, labeled, and ethically collected, supporting reliable model training and accurate predictions.

View

0 Recommendations

Manoj Kumar Srivastav

asked a question related to Dataset

How to get dataset in social network analysis for decision making?

Question

1 answer

May 4, 2024

Decision is an important concept in social network. If any one have website link , please give link.

Relevant answer

Charles Perez

May 8, 2024

Answer

I recommend exploring some of the primary graph database resources that offer a wide range of datasets for social network analysis. The Stanford Network Analysis Project (SNAP) hosts a diverse collection of datasets from various social networks, which can be found at http://snap.stanford.edu. Additionally, Gephi, an open-source network visualization tool, provides datasets that are particularly useful for visual analysis and can be accessed through their GitHub repository at https://github.com/gephi/gephi/wiki/Datasets. These resources should be valuable for your decision-making research in social network analysis.

View

0 Recommendations

Md. Alamgir Hossain

asked a question related to Dataset

How to convert .pcap files to .csv files?

Question

2 answers

May 7, 2024

I am trying to apply a machine-learning classifier to a dataset. But the dataset is in the .pcap file extension. How can I apply classifiers to this dataset?

Is there any process to convert the dataset into .csv format?

Thanks,

Relevant answer

Soumya Sri Perepu

May 8, 2024

Answer

"File" > "Export Packet Dissections" > "As CSV..." or "As CSV manually

import pyshark

import csv

# Open the .pcap file

cap = pyshark.FileCapture('yourfile.pcap')

# Open a .csv file in write mode

with open('output.csv', 'w', newline='') as csvfile:

writer = csv.writer(csvfile)

# Write header row

writer.writerow(['No.', 'Time', 'Source', 'Destination', 'Protocol', 'Length'])

# Iterate over each packet

for packet in cap:

try:

# Extract relevant information from each packet

no = packet.number

time = packet.sniff_timestamp

source = packet.ip.src

destination = packet.ip.dst

protocol = packet.transport_layer

length = packet.length

# Write the information to the .csv file

writer.writerow([no, time, source, destination, protocol, length])

except AttributeError:

# Ignore packets that don't have required attributes (e.g., non-IP packets)

pass

(may this will help in python)

View

10 Recommendations

Frederick Aziadzo

asked a question related to Dataset

How do I extract Schottky barrier, ideality factor from temp. dependent JV, using Richardson plot or Arrhenius plot using origin. What Js is used?

Question

6 answers

Apr 25, 2024

I possess data on JV, current, voltage, and current density obtained from a simple diode (metal-semiconductor). Currently, I aim to plot the Richardson curve and Arrhenius plot to determine the Schottky barrier height (SBH). However, I am encountering a challenge as the plotted curve exhibits a negative slope, deviating from the typical trend. Additionally, I am uncertain about the appropriate values of Js or Io to employ, as well as the extraction method, despite my prior research efforts. Could someone provide a step-by-step guide, preferably utilizing Origin software, to extract the SBH and ideality factor from the Richardson or Arrhenius plot? I have attached the dataset for reference. Your assistance would be greatly appreciated, and I would be grateful for a sample data or worksheet demonstrating the procedure

modifi
e.png
215.81 KB
JV-data-temperatur
e.xlsx
65.82 KB

Relevant answer

Frederick Aziadzo

May 7, 2024

Answer

Jürgen Weippert Hello, thank you for your support, i got help from a senior colleague from another university. i had to use the modified Richardson plot to get the data in linear form.

View

0 Recommendations

Md. Jahangir Alam

asked a question related to Dataset

What are the databases/websites from where I can freely download human SNP data and corresponding phenotypes for GWAS?

Question

7 answers

Jul 8, 2021

Can anyone please tell me the database names or websites from where I can download human SNP datasets along with the quantitative traits (phenotypes) for genome-wide association studies (GWAS)?

Relevant answer

Muhammad Muneeb

May 6, 2024

Answer

This would be helpful.

https://github.com/MuhammadMuneeb007/GWASPokerforPRS

View

0 Recommendations

Christopher Ufuoma Onova

asked a question related to Dataset

How much of AI is acceptable and ethical in research?

Question

5 answers

Apr 16, 2024

I ask this question from the perspective that AI algorithms can automate tasks, analyze vast amounts of data, and suggest new research avenues. It can also improve research efficiency and speed up scientific progress, in addition to analysing massive datasets, it could identify patterns, and accelerate scientific discovery. These no doubt are advantageous benefits, but AI algorithms can also inherit biases from the data they're trained on, leading to discriminatory or misleading results, which directly affect research in terms of the quality of output. Additionally, the "black box" nature of some AI systems makes it difficult to understand how they reach conclusions, raising concerns about transparency and accountability.

Relevant answer

Stephen I. Ternyik

May 4, 2024

Answer

AI tools are mainly about extended memory capacity and speed on the information highway Christopher Ufuoma Onova , i.e. for the experienced researcher and master of a subject, it is advancing creativeness and innovation. For the novice in scientific research, it is better to learn the facts of a subject by automated programmed instruction, before using the speed and memory of artificial cognitive systems, because this requires learned human supervision, in terms of rational and ethical faculty.

With the move to digital devices we are coming to a point where people don’t need to remember anything, they have it all in the palm of their hand.

One complaint I have heard from professors and others is that the generation of young people now entering the workplace don’t know how to communicate. They are poor writers and their coordination and collaboration skills are lacking. Some of this would have to be a direct result their being wedded to their “digital assistants.”

We can get smarter, or just more dependent; this is definitely our moral choice, with respect to our freedom-of-choice.

_____________

"I fear the day when the technology overlaps with our humanity. The world will only have a generation of idiots." Albert Einstein

View

12 Recommendations

Saima Raza

asked a question related to Dataset

Can i fed the chunk of filtered and ICA pruned data to SVM classifier sans extracting any feature and as I cannot go for deep learning?

Question

2 answers

Apr 24, 2024

I am working on EEG classification problem and have collected my own data set ,at first step I applied filters and ICA than decomposed it by applying db3 at level 6 ,after trying all kind of known feature(Non Statistical i.e. Hjorth features, Entropy, band Power and Statistical features mean, median, variance ,skewness ,std_deviation ) accuracy with SVM classifier with 10-K fold method has stuck at 63% ,where as when i fed the chunk of cleaned and pruned data to SVM the accuracy of different chunks is 80% .what should i report in paper please guide.

Relevant answer

Aqsa Gilani

Apr 25, 2024

Answer

In your paper, u should comprehensively report both the traditional feature extraction approach and ur novel method of directly using cleaned and pruned EEG data with the SVM classifier. Here's what to include:

Methodology: Clearly describe the preprocessing steps (filtering, ICA, etc.), and detail how the SVM was applied with and without traditional feature extraction
Results: Present the classification accuracies of both approaches. Highlight the improvement observed when bypassing the feature extraction
Discussion: Offer insights into why the direct approach might be yielding better results, such as the preservation of critical data features that are possibly lost during feature extraction
Validation: Include details of the validation techniques used (like cross-validation) to ensure the results are robust and not merely due to overfitting
Conclusion: Suggest why the direct use of cleaned data could be beneficial, based on your findings and validations.

This structure will help underline the efficacy and reliability of your approach providing valuable insights into EEG data classification methodologies...

View

5 Recommendations

Iman Q. Abduljaleel

asked a question related to Dataset

Help to find multimodl fake news datasets in english language?

Question

7 answers

Apr 23, 2024

good greeting

Can I get help finding a dataset for multi-model fake news and downloading it?

Relevant answer

Unity Mmapula Nkateng

Apr 25, 2024

Answer

I wish I could help, I think you need to collect the fake news and create a data base from different sources and identify what you want to do there.

Thanks

View

2 Recommendations

Miao Yang

asked a question related to Dataset

Is this practice a p-hacking or the garden of forking path?

Question

3 answers

Apr 12, 2024

Imagine you join a new research lab and are immediately assigned a dataset that contains the same variables as those in the substance abuse dataset but comprising a new sample. You are told the lab originally collected this data set with one particular research question in mind: “Does satisfaction with life predict health outcomes?” Before doing any statistical tests, you decide to browse through the dataset and make some graphs of the results. It seems to you that in your data, satisfaction with life predicts health outcomes more strongly in males than in females, and on reflection, you can think of several theoretical reasons why that should be the case. You disregard the data on females and investigate the hypothesis that low levels of satisfaction with life (using the “swl” variable) will be positively predictive of mental ill health (using the “psych6” variable) in males. You finally do a statistical test, and obtain a very low p-value (less than .001) associated with the regression coefficient. You write a paper using this single result, concluding that there is strong evidence for your hypothesis.

Question: Is this practice a p-hacking or the garden of forking path?

Relevant answer

Christian Geiser

Apr 23, 2024

Answer

It certainly would not be sound to publish this result as if you had had that "hypothesis" prior to looking at the data. It may be fine to publish it as an explicitly exploratory (data-generated/data-informed) finding with a cautionary note stating that this is a data-driven result that needs to be validated/replicated with fresh (and ideally experimental or at least quasi-experimental) data (truly experimental may be difficult in this case because it may not be possible or ethical to manipulate life satisfaction).

View

4 Recommendations

Miao Yang

asked a question related to Dataset

Is this practice a p-hacking or the garden of forking path?

Question

5 answers

Apr 12, 2024

Question: Is this practice a p-hacking or the garden of forking path?

Relevant answer

Daniel Jacob Bilar

Apr 14, 2024

Answer

IYH dear Miao Yang

It's IMHO p-hacking. Why? By focusing solely on the data from males and disregarding the data from females, the researcher is engaging in a form of cherry-picking, which can lead to biased results and conclusions. This behavior increases the chances of finding a statistically significant result by chance alone (especially if multiple comparisons or tests are conducted).

The practice described in the scenario is not an instance of 'the garden of forking paths'. The garden of forking paths refers to a situation where researchers explore multiple hypotheses, analysis paths, or data subsets, and then selectively report the results that support their preferred hypothesis or direction of the effect. In this case the researcher has not explored multiple alternative hypotheses or analysis paths before settling on the one that supports their initial hypothesis. Instead, the researcher has formed a specific hypothesis based on their observation of the data and the theoretical reasons they can think of, and then tested that hypothesis using a single statistical test.

View

4 Recommendations

Max Gl

asked a question related to Dataset

Is the sample size sufficient for using a fixed effects model?

Question

2 answers

Apr 14, 2024

For my research, I will retrieve data for each firm (100 firms) over 5 years, leading to 500 data points.

Should this dataset size be sufficient for using a fixed effects model?

Relevant answer

Max Gl

Apr 14, 2024

Answer

Chuck A Arize Thank you for the quick response. I want to examine a positive linear relation between two continuous variables. Additionally, I use 6 control variables (mostly continuous).

View

0 Recommendations

Berdja Rafik

asked a question related to Dataset

Can a dimension be explained by more than one latent variable ?

Question

1 answer

Oct 4, 2021

Hello,

I performed a Correspondence analysis on species count data associated with 2 illustrative (qualitative) variables. the problem is that the data set is : (i) relatively small and (ii) I have concatenated two factors in one (i.e: Health state (binary) - the name of a species (3 different species)) as one supplementary variable.

I expected to get Dim 1 & Dim 2 to be represented each by one of the two supplementary variables but the results were different as I got Health state on the first Dim along with the second additional illustrative variable and the second Dim explained by the name of the species.

I'm confused! should I continue interpreting or perform an ACM?

LI.jpg
651.61 KB

Relevant answer

Chuck A Arize

Apr 14, 2024

Answer

Take a closer look at your data to understand its characteristics better. Are there any outliers or unusual patterns that could be affecting the results? Exploring the data visually or through other statistical techniques may provide insights into why the analysis is producing unexpected results. Then, you could consider performing additional analyses such as an Alternating Least Squares Correspondence Analysis (ALSCAL) or a Multiple Correspondence Analysis (MCA). These may provide different perspectives on your data and help you understand the relationships between variables better.

View

0 Recommendations

Barsa Rani Dey

asked a question related to Dataset

How much execution time is normal for completion of a tensorflow dataset generation function when working on a GPU accelerated workstation?

Question

1 answer

Apr 6, 2024

I am working on a deep learning model with following GAN(Generative Adversarial Network) architecture and got stucked on dataset creation(using tensorflow function to simplify the data for my model) step and particularly dubious about the optimum time required for the execution of this function(since i am a beginner).

My dataset includes 4 types of 2D MRI modalities alongwith a Ground truth for 100 subjects, and for each subject there are 100 images.

Relevant answer

Samar M. Nour

Apr 14, 2024

Answer

Without specific details about the dataset size, complexity of the data generation process, and hardware specifications of your GPU-accelerated workstation, it's challenging to provide a precise estimate of the normal execution time. However, for moderately sized datasets and efficient dataset generation functions running on a well-configured GPU-accelerated workstation, you might expect the execution time to range from seconds to minutes. For very large datasets or highly complex data generation processes, it could take longer.

View

0 Recommendations

Zoe Pas

asked a question related to Dataset

What to do about some data sets being normally distributed, some being normally distributed after log transformation and some not norm. dist. at all?

Question

3 answers

Apr 12, 2024

Hi all,

I have collected some data from 3 treatment groups (n=5) with relation to 6 markers of fibrosis. When I was looking to analyse (with GraphPad Prism) these 6 data sets, I noticed that some data sets were normally distributed, while others are normally distributed after logarithmic transformation and one data set was not normally distributed (even after logarithmic transformation).

As I'm not too familiar with this kind of result, I thought to do the following. Please let me know if this is correct or if I have misunderstood.

For data sets that are normally distributed: Analyse data as is with parametric one-way ANOVA
For data sets that are normally distributed after logarithmic transformation: Analyse y=ln(x) transformed data with parametric one-way ANOVA
For data set that is not normally distributed: Analyse data as is with the non-parametric Kruskal-Wallis One-Way ANOVA. (Is there any other way to transform data as a way of normalising it?)

Would it be strange to present these 6 data sets together when some have undergone log transformation where others have not?

What should I do? I would greatly appreciate any advice

Relevant answer

Mosharop Hossian

Apr 13, 2024

Answer

For datasets that are normally distributed:

What to do: Proceed with the parametric one-way ANOVA as planned. This statistical test assumes that the data are normally distributed, which matches your scenario.

For datasets that become normally distributed after logarithmic transformation:

What to do: You can indeed apply a logarithmic transformation to these datasets and then use parametric one-way ANOVA on the transformed data. This approach is valid as long as you specify in your report or presentation that you transformed the data due to non-normality in its original form.

For the dataset that is not normally distributed, even after logarithmic transformation:

What to do: The Kruskal-Wallis test, which you’ve mentioned, is a suitable choice for this dataset. This test does not assume normality and is effective for comparing median differences between groups.

Alternative transformations: Besides logarithmic transformation, you could try other transformations such as square root or Box-Cox transformation to see if they help in achieving normality. However, if transformations do not result in normal distribution, sticking with a non-parametric approach like Kruskal-Wallis is recommended.

Is it okay to present these datasets together?

Explain clearly: When presenting your results, be transparent about the methods you used for each dataset. Mention which datasets were transformed and which were analyzed in their original form.
Justify the choice: Briefly explain why different approaches were necessary, emphasizing that these methods were chosen based on the statistical properties of each dataset to ensure the most accurate analysis.

View

4 Recommendations

Md Foysal Ahmed

asked a question related to Dataset

What are the challenges and solutions for real-time recognition in computer vision when encountering objects not present in the trained dataset?

Question

4 answers

Mar 24, 2024

When a model is trained using a specific dataset with limited diversity in labels, it may accurately predict labels for objects within that dataset. However, when applied to real-time recognition tasks using a webcam, the model might incorrectly predict labels for objects not present in the training data. This poses a challenge as the model's predictions may not align with the variety of objects encountered in real-world scenarios.

Example: I trained a real-time recognition model for a webcam, where I have classes l_c = {a, b, c, ..., m}. The model consistently predicts class l_cperfectly. However, when I input a class that doesn't belong to l_c, it still predicts something from class l_c.

Are there any solutions or opinions that experts can share to guide me further in improving the model?

Thank you for considering your opinion on my problems.

Relevant answer

Adil Hussain

Apr 11, 2024

Answer

Some of the solutions are transfer learning, data augmentation, one-shot learning, ensemble learning, active learning, and continuous learning.

View

9 Recommendations

Wen Xue

asked a question related to Dataset

Is this practice a p-hacking or the garden of forking path?

Question

3 answers

Apr 8, 2024

Question: What is a term for the practice that you are engaging in?

Is this practice a p-hacking or the garden of forking path?

Relevant answer

Hening Huang

Apr 9, 2024

Answer

There is ongoing debate in the scientific community about the validity of p-values in scientific research. Many scientists and statisticians are calling for abandoning statistical significance tests and p-values. The only way to avoid p-hacking is to not use p-values. You should make scientific inference based on some descriptive statistics and your domain knowledge (not p-values).

View

8 Recommendations

Ayo Ajayi

asked a question related to Dataset

Please how can I get a larger local hospital based antibiotic resistance dataset for machine learning. The dataset must be a minimum of 1000 rows?

Question

1 answer

Apr 6, 2024

Machine learning can be used for prediction of antibiotic resistance in healthcare setting. The problem is that most hospitals do not have electronic health records. Datasets need to be large to make models reliable. I need a dataset with minimum of 1000 records. Kindly assist

Relevant answer

Zaheen Omar

Apr 9, 2024

Answer

I would approach the pathology laboratory attached to the hospital group amd speak to the microbiologist or speak to the occupational infection control head at the facility that does the surveillance for the hospital on the antibiograms for the facility

View

0 Recommendations

Ibrahim Irfan

asked a question related to Dataset

How do I deal with the following error in WEKA?

Question

3 answers

Feb 23, 2024

wrong number of values. Read 20, expected 22, read Token[EOL], line 3215 Problem encountered on Line: 3215.

I need help with this error in order to load the dataset onto WEKA

housing_te
st.csv
5.30 MB

Relevant answer

Mohammad Mamun

Apr 8, 2024

Answer

The error message "wrong number of values" typically indicates that there is a mismatch between the number of attributes defined in the dataset and the actual number of values provided in one or more instances. In your case, it seems that WEKA is expecting 22 attributes but is reading only 20 values on line 3215 of your dataset, encountering an end-of-line (EOL) token. To resolve this error, you should carefully examine the dataset, particularly line 3215, to ensure that each instance contains values for all 22 attributes, separated by the appropriate delimiter (usually a comma or a tab). It's possible that there might be missing or incorrectly formatted values in this line or elsewhere in the dataset. Once you correct the dataset to ensure that each instance has the correct number of values, you should be able to load it successfully into WEKA.

View

0 Recommendations

.. Reetu

asked a question related to Dataset

Can anyone suggest me the latest dataset for Credit Card Fraud Detection ?

Question

2 answers

Feb 13, 2024

I am working on the project to detect credit card fraud using machine learning. Looking for a latest dataset .

Thanks in advance

Relevant answer

Mohammad Mamun

Apr 8, 2024

Answer

One commonly used dataset for credit card fraud detection is the Credit Card Fraud Detection Dataset available on Kaggle, which contains transactions made by credit cards in September 2013 by European cardholders. This dataset encompasses transactions over a two-day period, including 492 frauds out of 284,807 transactions, making it imbalanced but reflective of real-world scenarios. Additionally, the IEEE-CIS Fraud Detection Dataset on Kaggle offers a more extensive set of real-world features for transactional data, suitable for advanced machine learning models. For cases where real-world data is limited or sensitive, synthetic datasets like the Credit Card Fraud Detection Synthetic Dataset on Kaggle provide an alternative. As with any dataset, it's crucial to understand its limitations, potential biases, and preprocessing requirements while adhering to proper citation and usage protocols.

View

0 Recommendations

Md Azizur Rahman

asked a question related to Dataset

Data import Error (from web of science) in Biblioshiny package of RStudio

Question

1 answer

May 19, 2023

I have installed R-cran, then RStudio and all the related packages but I can't import raw dataset from web of science in biblioshiny package of RStudio. Please see the error mgs attached herewith and suggest solution if possible.

error_biblioshi
ny.jpg
135 KB

Relevant answer

Margarita Alejandra Aucancela

Apr 7, 2024

Answer

Hola, en la parte inferior del bibioshiny exporta el archivo a excel y vuelve abrir el archivo en formato Biblishiny y se soluciona

solucionBiblioshi
ny.png
558.62 KB

View

0 Recommendations

Dhritikesh Chakrabarty

asked a question related to Dataset

What is the logic for which each of AM , GM and HM is accepted as a measure of the central tendency of data ?

Question

3 answers

Jun 11, 2022

It is universally accepted that Arithmetic Mean (AM), Geometric Mean (GM) and Harmonic Mean (HM) are three measures of central tendency of data.

Suppose,

X1 , X2 , ................ , Xn

are n observations in a data set with C as their central tendency.

I am in the thrust of what is the logic or what is the cause for which each of AM , GM and HM is accepted as a measure of the central tendency C.

If I accept the argument that each of

AM ( X1 , X2 , ................ , Xn} , GM ( X1 , X2 , ................ , Xn) and HM ( X1 , X2 , ................ , Xn)

converges to C as n tends to infinity for which they are regarded as measures of C i.e. the central tendency of X1 , X2 , ................ , Xn ,

will it be correct ?

Relevant answer

Sundus F Hantoosh

Apr 7, 2024

Answer

Dear Doctor

[Arithmetic Mean of Ungrouped Data: There are two methods for calculating arithmetic mean for ungrouped data. i) Direct method ii) Indirect or short cut method i)

Direct method:

Arithmetic Mean (A.M.) =Sum of observations/Number of observations

Indirect or short cut method: In this method an arbitrary assumed mean is used. Deviations of individual observations from this assumed mean are taken for calculating arithmetic mean.

Arithmetic Mean of Grouped Data: There are two methods for calculating arithmetic mean of grouped data.

i) Direct method

ii) Indirect or step-deviation method

Merits of Arithmetic Mean:

i) It is easy to understand and calculate

ii) It is based on all observations

iii) It is rigidly defined

iv) It is capable of further mathematical treatment

v) It is least affected by sampling fluctuation.

Demerits of Arithmetic Mean:

i) It is unduly affected by extreme values.

ii) In case of open ended classes it cannot be calculated.

B. Geometric Mean:

When we are interested in measuring average rate of change over time then we use geometric mean. Geometric mean is defined as the nth root of the product of n items (or) values.

Uses of Geometric Mean:

Geometrical Mean is especially useful in the following cases.

1) The G.M is used to find the average percentage increase in sales, production, or other economic or business series. For example, from 1992 to 1994 prices increased by 5%,10%,and 18% respectively, then the average annual income is not 11% which is calculated by A.M but it is 10.9 which is calculated by G.M.

2) G.M is theoretically considered to be best average in the construction of Index numbers.

C. Harmonic Mean:

The Harmonic Mean (H.M.) is defined as the reciprocal of the arithmetic mean of the reciprocals of the individual observations.

Merits of Harmonic Mean:

1) Its value is based on all the observations of the data.

2) It is less affected by the extreme values. 3) It is strictly defined.

Demerits of Harmonic Mean:

1) It is not simple to calculate and easy to understand.

2) It cannot be calculated if one of the observations is zero.

3) The H.M is always less than A.M and G.M.

Uses of Harmonic Mean:

The H.M is used to calculate the averages where two units are involved like rates, speed, etc.

Relation between A.M., G.M. and H.M.

The relation between A.M, G.M, and H.M is given by

A.M >= G.M >= H.M

Note: The equality condition holds true only if all the items are equal in the distribution.

Summary

The measures of central tendency give us an idea about the central value around which the data values cluster. That’s why these values are considered to be representative values i.e. the values which represent the data. Arithmetic mean is the most common measure of central tendency which is obtained by adding all the observations and then dividing the sum by the number of observations. Geometric mean is used for measuring the average rate of change over time. It is defined as the nth root of the product of n items (or) values. Harmonic Mean (H.M.) is defined as the reciprocal of the arithmetic mean of the reciprocals of the individual observations.]

View

16 Recommendations

Mohammad Karimi

asked a question related to Dataset

What is the difference between FMRI datasets and MRI datasets in terms of data analysis?

Question

3 answers

Apr 3, 2024

Hello guys

I want to employ FMRI for conducting research.

At first step, I want to know FMRI data is an image like MRI.

Or I should behave with FMRI like time-series when it comes to analyzing data

thank you

Relevant answer

Hadeel M. Ali

Apr 4, 2024

Answer

MRI datasets typically result in high-resolution three-dimensional images representing anatomical structures. These images are often stored in formats such as DICOM (Digital Imaging and Communications in Medicine) or NIfTI (Neuroimaging Informatics Technology Initiative). fMRI datasets produce time-series data representing changes in brain activity over time. These data are often stored in formats compatible with neuroimaging software packages, such as NIfTI, Analyze, or MINC (Medical Imaging NetCDF). fMRI data can be conceptualized and analyzed both as images and time-series. The choice of representation depends on the specific research question and analysis techniques being employed. For many analyses, researchers will use both approaches, leveraging the spatial information provided by the image-like representation and the temporal dynamics captured in the time-series data.

View

5 Recommendations

Amir Gharehgoli

asked a question related to Dataset

Automatic Modulation Classification (AMC)?

Question

1 answer

Mar 27, 2024

I am researching on automatic modulation classification (AMC). I used the "RADIOML 2018.01A" dataset to simulate AMC and used the convolutional long-short term deep neural network (CLDNN) method to model the neural network. But now I want to generate the dataset myself in MATLAB.

My question is, do you know a good sources (papers or codes) that have produced a dataset for AMC in MATLAB (or Python)? In fact, have they produced the In-phase and Quadrature components for different modulations (preferably APSK and PSK)?

Relevant answer

Adnan Majeed

Apr 3, 2024

Answer

Automatic Modulation Classification (AMC) is a technique used in wireless communication systems to identify the type of modulation being used in a received signal. This is important because different modulation schemes encode information in different ways, and a receiver needs to know the modulation type to properly demodulate the signal and extract the data.

Here's a breakdown of AMC:

Applications:Cognitive Radio Networks: AMC helps identify unused spectrum bands for efficient communication. Military and Electronic Warfare: Recognizing communication types used by adversaries. Spectrum Monitoring and Regulation: Ensuring proper usage of allocated frequencies.
Types of AMC Algorithms:Likelihood-based (LB): These algorithms compare the received signal with pre-defined models of different modulation schemes. Feature-based (FB): These algorithms extract features from the signal (like amplitude variations) and use them to classify the modulation type.
Recent Advancements:Deep Learning: Deep learning architectures, especially Convolutional Neural Networks (CNNs), are showing promising results in AMC due to their ability to automatically learn features from the received signal.

Here are some resources for further reading:

A research paper on Deep Learning for AMC: https://www.mdpi.com/2079-9292/11/19/3018
Another paper on a robust AMC method using CNNs: https://ieeexplore.ieee.org/document/9691687

View

0 Recommendations

Yongbo Xiang

asked a question related to Dataset

Dataset for evaluating seismic performance of bridges?

Question

1 answer

Mar 30, 2024

I want to train neural networks to evaluate the seismic performance of bridges, but the papers online are all based on their own databases and have not been published. Where can I find the relevant dataset? The dataset can include the following content: yield strength of steel bars, compressive strength of concrete, number of spans, span length, seismic intensity, support type, seismic damage level, etc

Relevant answer

Abdullah Ansari

Apr 3, 2024

Answer

Yongbo Xiang Your inquiry poses an interesting task. I'm also eager to acquire the corresponding data, alongside the documented history of bridge element damages attributed to past seismic events found in the literature.

View

0 Recommendations

Hadeel Abdelhamid

asked a question related to Dataset

Which is better to use in the evapotranspiration calculation ERA-5 or ERA5-land ?

Question

4 answers

Jan 14, 2024

Hello everyone, I would appreciate you helping me with this question,

I am using remote sensing-based models to calculate ET actual. I want to use ERA-5 dataset as a meteorological dataset

I found that ERA-5 dataset has different pressure levels and provides (Relative humidity, Temperature, and Wind speed in u and v directions) but it doesn't provide shortwave solar radiation.

On the other hand, ERA5-Land provides (2m temperature, Surface solar radiation downwards, 10m u-component of wind, 10m v-component of wind) but doesn't provide relative humidity.

My question is:

Which one is better to use ERA5 or ERA5-land?

if I used ERA-5 I would find two problems, the first is to identify the pressure level, and the second is to get the short-wave radiations.

and if I used the ERA-5-land I don't know how to get relative humidity.

Also, can I use ERA5 for the relative humidity and the ERA5-land for all other parameters?
Finally, what is the shortest way to get hourly data for certain days (20 days) for a specific location (E, N) as a CSV file

Relevant answer

Karim Soliman

Apr 2, 2024

Answer

- To answer your first question, I would suggest that you could use both products. However, regarding your confusion about the missed variables in both products, you might overcome this by choosing the actual evapotranspiration method that suits each method.

- Reference to your second question, I do not see any problems.

- Regarding the third question, you could use API via creating an account and writing an API script.

View

0 Recommendations

Wakeel Ahmed

asked a question related to Dataset

Machine Learning : How XGBoost algorithm will work if we have more than four Independent (Features) variable and One dependent (targeted) Variable?

Question

1 answer

Mar 22, 2024

I am working on situation where I have more than one independent ( Feature) variable, let say four and one dependent (targeted) variable, my confusion is when XGBoost algorithm applied on such data set, during prediction it will consider all feature variable as individual like linear regression do or as a group to develop decision trees (Prediction).

If any reference paper or book available related to supervised Machine Learning Algorithm , Kindly share

Relevant answer

Jorge Bernal-Alviz

Mar 30, 2024

Answer

The answer is no. XGBoost assumes the independence of the predictors but uses their possible interactions as a basis to create an additive and surrogate structure made up of weak learners. Therefore, the effect of each predictor will be influenced by the effect of the other predictors and their number, and that is the extra point of boosting compared to conventional regression methods. To understand more deeply how XGBoost works, nothing better than a detailed explanation from its creators: https://doi.org/10.1145/2939672.2939785

Best regards!

View

0 Recommendations

Mahsa Bahrami

asked a question related to Dataset

How can I overcome the overfitting problem?

Question

6 answers

Mar 22, 2024

I have a dataset from lung cancer images with 163 samples (2D images). I use the fine-tuning of deep learning algorithms to classify samples, but the validation loss did not decrease. I augmented the data and used dropout, but the validation loss didn't drop. How can I solve this problem?

Fig1.p
ng
11.73 KB

Relevant answer

Sukanya Ghosh

Mar 29, 2024

Answer

I feel there are few checks and techniques that could be applied to avoid/mitigate overfitting:

1.Clean your dataset (check and handle null values, missing values and decide accordingly to keep of remove the records)

2.Handle the outliers.

3.Cross validation: Split the data into training and validation/test sets to evaluate model performance on unseen data. Use techniques like k-fold cross-validation to get a more robust 4.estimate of model generalization.

5.Feature Selection/Dimensionality Reduction: Identify and remove irrelevant, redundant or noisy features that may be causing overfitting.

6.Thoroughly evaluate model performance on held-out test data, not just the training data.se techniques like Principal Component Analysis (PCA) to reduce the dimensionality of the data.

7.Apply regularization techniques like L1 (Lasso), L2 (Ridge) or Elastic Net to control model complexity and prevent overfitting.

8.Use simpler models with fewer parameters, such as linear regression or decision trees, instead of more complex models like neural networks.

View

0 Recommendations

Mohammad Shafiul Alam

asked a question related to Dataset

Autism diagnosis Facial Image dataset

Question

2 answers

Mar 26, 2024

What is the popular facial image dataset to detect Austisms, and what are the sources?

Relevant answer

Mohammad Shafiul Alam

Mar 26, 2024

Answer

YTUIA - research paper ID -https://www.mdpi.com/2075-4418/14/6/629

View

0 Recommendations

Monika Sharma

asked a question related to Dataset

How to test linear separability of multi class data where number of classes are more than 3?

Question

1 answer

Jan 7, 2022

I have attached two images of the scatter- plot of some datasets. as per these plots, it seems that data is not linearly separable.

Can anyone please confirm my understanding?

Thanks and Regards

Monika

WhatsApp Image 2022-01-07 at 18.29.1
1.jpeg
176.26 KB
WhatsApp Image 2022-01-07 at 18.29.3
4.jpeg
162.71 KB

Relevant answer

Ashish Jha

Mar 26, 2024

Answer

Hi, Which dataset do you use ? I am also searching for this answer.

View

0 Recommendations

Harshavardhana Anantharaman Srinivasan

asked a question related to Dataset

Endogeneity Analysis without the access of raw data?

Question

4 answers

Mar 4, 2024

I currently have the correlation/covariance matrix for a set of variables, as well as the output from a regression analysis, but lack access to the underlying raw dataset. Given these constraints, would it be feasible to conduct a comprehensive analysis of endogeneity? If so, I would greatly appreciate guidance on methodologies or statistical techniques that could be employed to investigate the presence of endogeneity under these circumstances.

Thanks,

Harshavardhana

Relevant answer

Mohammad Mamun

Mar 26, 2024

Answer

Endogeneity analysis typically requires access to raw data or at least detailed information about the variables of interest and their relationships. However, conducting endogeneity analysis without access to raw data can be challenging and may limit the depth of the analysis. Here are some approaches you can consider if you don't have access to raw data:

Literature Review: Start by reviewing existing literature on the topic of interest. Look for studies that have addressed endogeneity issues similar to yours and examine their methodologies, including how they handled endogeneity concerns. This can provide insights into potential strategies or techniques you can apply in your analysis.
Theoretical Considerations: Based on your understanding of the subject matter and theoretical framework, try to identify potential sources of endogeneity in your analysis. Consider factors that could lead to correlation or causality issues between variables and think about how these issues might be addressed or mitigated.
Instrumental Variables: If you have access to instrumental variables that are plausibly exogenous and relevant to your analysis, you can use them to address endogeneity concerns. Instrumental variables estimation requires careful selection and validation of instruments, so make sure to justify their relevance and validity in your context.
Quasi-Experimental Methods: Explore quasi-experimental methods or natural experiments that exploit exogenous variation in the data. These methods can help identify causal effects while controlling for endogeneity. Examples include difference-in-differences, regression discontinuity design, and propensity score matching.
Sensitivity Analysis: Perform sensitivity analysis to assess the robustness of your results to potential sources of endogeneity. This involves testing the sensitivity of your findings to different model specifications, control variables, and assumptions. Sensitivity analysis can provide insights into the reliability and stability of your results.
Expert Consultation: If possible, consult with experts or researchers who have experience with the data or topic area. They may offer valuable insights or suggest alternative approaches to address endogeneity concerns given the limitations of the available data.

While conducting endogeneity analysis without access to raw data presents challenges, it's still possible to employ various strategies and techniques to mitigate endogeneity concerns and produce meaningful results. However, it's essential to acknowledge the limitations of the analysis and interpret the findings cautiously, considering the potential impact of unobserved factors and data limitations on the results.

View

0 Recommendations

Merten Stender

asked a question related to Dataset

Where to host and publish large (500GB) scientific data sets?

Question

1 answer

Mar 21, 2024

Hi all,

I am looking for public repositories or services that are willing to host some large scientific data sets in the range of hundreds of GBs. Besides the universities' infrastructures we have found Hugging Face as another option. At the same time, public funding agencies would prefer some public non-commercial platforms. Looking for ideas to complete the list below:

For small data:

University servers
Zenodo (<50GB per default)
GitHub (<2GB free version)

For large data:

Hugging Face (<5GB per file, multiple files possible)
? any ideas?

Thanks!

Relevant answer

Zhongying Feng

Mar 26, 2024

Answer

Hi,

Perhaps you can consider building a personal website if the budget allows.

Furthermore, your wave forecasting based on U-Net is very beautiful. Please allow me to express my gratitude for your work. It has been very inspiring to me.

View

0 Recommendations

Hafiza Muntaha Khalid

asked a question related to Dataset

Is it possibleto analyze a data set that has dependent variable time series and independent variables are cross sectional?

Question

3 answers

Mar 22, 2024

for example dependent variable is daily stock returns and independent variables are company characteristics such as firm size, leverage etc.

Relevant answer

Inès François

Mar 25, 2024

Answer

I think the data could be used but the firm size needs to be the most recent and temporally close to your daily stock returns. Obviously, when you evaluate your models you should acknowledge these limitations.

View

1 Recommendation

Joopally Vedarutvija

asked a question related to Dataset

DEAP Dataset Download problem

Question

17 answers

Apr 10, 2022

I need DEAP dataset urgently, I didn't recieve any username and password from the officials, can some one please help me with the dataset if you have it or the credentials.

Relevant answer

Shilpa Sj

Mar 25, 2024

Answer

No, please ask your professor to request access to Deap dataset.

View

0 Recommendations

Shazia Shaikh

asked a question related to Dataset

What is the efficacy of synthetic thermal Images in diagnostic research?

Question

1 answer

Mar 19, 2024

Are synthetic thermal images useful in bio-medical image processing for diagnostic purposes? Please share some resources of such data sets.

Relevant answer

Xinyu Zhang

Mar 20, 2024

Answer

Well, we normally don't use thermal images to reflect the inner information and distribution(we should know the illness ROI), cuz it puts more focus on the structural information. That means, this kind of image is more suitable for constructions and some occasions that needs only boundary and structural information.

View

0 Recommendations

Subhadeep Mandal

asked a question related to Dataset

Is free soil data available online with spatial resolutions of 10 m or 30 m?

Question

3 answers

Mar 12, 2024

1. I'm seeking soil property raster datasets with resolutions matching those of Sentinel or Landsat imagery, as SoilGrids data are currently available only at a 250m resolution. Therefore, I'm interested in finding options with finer resolutions of 10/30m. What are the best available alternatives with finer resolutions?

2. Is there a dataset specific to the Indian context that provides more accurate and locally optimized soil data?

Relevant answer

Paweł Bronisław Dąbek

Mar 18, 2024

Answer

Subhadeep Mandal detailed data on the spatial distribution of soils are rather a national feature, and such data can be found in national resources, but they are rarely open access

View

0 Recommendations

Shantanu Shukla

asked a question related to Dataset

Can I use a random dataset? Currently, no desired dataset is available for load balancing in nodes. can I use a random dataset?

Question

2 answers

Mar 15, 2024

Anyone please explain this

Relevant answer

Sadha Shiva Reddy Chilukoori

Mar 16, 2024

Answer

Shantanu Shukla You would choose a dataset based on the characteristics of the tests you want to perform. In a very general case, you can utilize the TPC-H (https://www.tpc.org/tpch/) benchmark datasets to begin your tests.

View

4 Recommendations

Parimita Saikia

asked a question related to Dataset

Why is the Ljung-Box (Q) statistic specifically Ljung-Box (18), missing in SPSS when using data from the years 1998-2014 to predict for the year 2016?

Question

1 answer

Mar 15, 2024

In detail, when utilizing the data from 1998 to 2014 as the training dataset, the Ljung-Box (Q) statistic, particularly Ljung-Box (18), is not generated in SPSS. However, if the analysis incorporates the entire dataset spanning from 1998 to 2021, the statistics are produced without issue.

Untitl
ed.png
425.46 KB

Relevant answer

Gary Paul Martin Simpson

Mar 15, 2024

Answer

I think the 0 degrees of freedoms explains the problem, as you add more years the degrees of freedom will increase enabling the estimates to be calculated

View

2 Recommendations

Prakash Poudel

asked a question related to Dataset

What is the performance parameter for imblance dataset ?

Question

1 answer

Mar 11, 2024

I am curr research at phising detection Using URL . Using logistic Regression model . I have data set 1:10 ratio 20k legitimate and 2 k phishing .

Relevant answer

Nicolò La Porta

Mar 11, 2024

Answer

Hi Prakash Poudel!

You can rely on classical prediction metrics such as:

Precision, also known as 'positive predictive value', measures the proportion of correctly predicted positive instances out of all instances predicted as positive. It is calculated as the number of true positive results divided by the number of all samples predicted to be positive. You want high in scenarios where minimizing false positives is essential.
Recall, also known as 'sensitivity', quantifies the proportion of correctly predicted positive instances out of all actual positive instances. It is calculated as the number of true positive results divided by the number of all samples that should have been identified. You want high recall in scenarios where missing actual positives (false negatives) is costly.

A suitable combination of the above metrics for your purposes is the F2-score, which is a variant of the F1 score that puts a stronger emphasis on recall compared to the standard F1 score. Placing a stronger emphasis on recall rather than precision makes it suitable for tasks where capturing all positive instances is crucial.

Hope you found my answer useful.

Bests

View

4 Recommendations

Sarya Swed

asked a question related to Dataset

Participating in several #American dataset clinical studies related to the #internal medicine field

Question

10 answers

Aug 13, 2023

Greetings all, Our team, '#THE GLOBEST TEAM,’ presents a #great opportunity to participate in several in several #American dataset clinical studies related to the #internal medicine field. Your mission is to #write a specific section of the article and we will review your work. Once we have reviewed the manuscript, you will make any necessary edits related to your mission. If you have participated in #10 original research studies or reviews previously, and you are a #well-scientific writer, please leave a comment with your #name, #Google Scholar account, and #email address. #Research #opportunity #clinicalresearch #USAresearch #National Center for Health Statistics #THE GLOBEST TEAM

Relevant answer

Anooja Rani

Mar 7, 2024

Answer

Interested in participating.

Name: Anooja Rani

[email protected]

https://www.researchgate.net/profile/Anooja-Rani

View

0 Recommendations

马Ge 林涛

asked a question related to Dataset

Is there a dataset related to EEG signals in children with autism?

Question

2 answers

Feb 26, 2024

A dataset urgently needed for EEG signals in children with autism

Relevant answer

Leonardo Massoni

Mar 6, 2024

Answer

Look at this work: Sun S, Cao R, Rutishauser U, Yu R, Wang S. A uniform human multimodal dataset for emotion perception and judgment. Sci Data. 2023 Nov 7;10(1):773. doi: 10.1038/s41597-023-02693-z. PMID: 37935738; PMCID: PMC10630434.

You may find some useful information.

View

0 Recommendations

Mohd Aamir

asked a question related to Dataset

How to calculate FC value from microarray based GEO datasets?

Question

2 answers

Mar 21, 2022

I have studied one paper entitled "Transcriptome analysis of the response to chronic constant hypoxia in zebrafish hearts" . I want to know how the Fold change values calculated in this paper were retrieved. Is there any specific formula available or it was totally determined from software. I have also accessed the required and individual GEO accession(for comparing hypoxia and normoxia data samples as control and diseased samples respectively) available at NCBI-GEO datasets for the same paper. But the value given there is negative LOG FC for upregulated transcripts (based on individual profile transcripts ID). In contrast the value given for upregulated genes in the same paper is positive for upregulated transcripts. I did not get the discrepancy observed in these results. I have also tried to calculate the FC value from individual expression values for each transcript and every unique accession for normal and hypoxia samples. but the values for FC are totally different from paper data. I want to utilize the microarray data available from GEO for my work. Is there exist any specific method for data processing from such a microarray expression. Please share with me.

Relevant answer

Mohd Aamir

Mar 6, 2024

Answer

Thank You

View

0 Recommendations

Bao An Nguyen

asked a question related to Dataset

How to submit a data request form to Stanford's datastage (https://datastage.stanford.edu/)?

Question

4 answers

Jun 14, 2021

I learned that multiple researchers were successful obtain MOOCs datasets from Stanford via the CAROL website: https://datastage.stanford.edu/. The data request form was placed at http://carol.stanford.edu/research. However, recently the domain name carol.stanford.edu as well as the Center for Advanced Research through Online Learning (CAROL) disappeared on the Internet. Consequently, I have no way to request for my needed datasets.

Do you know another URL to submit the data request form, or any alternative solution/repository to obtain some MOOC learners' interaction data from well known course on Coursera or edX?

Thanks in advance

Relevant answer

Luis Miguel Arenas Tamayo

Mar 1, 2024

Answer

Jayashree Ganeshkumar sad, okay, thank you very much for the resources

View

0 Recommendations

Dariusz Prokopowicz

asked a question related to Dataset

Can artificial intelligence help improve sentiment analysis conducted on large data sets and information on Big Data Analytics platforms?

Question

3 answers

Feb 26, 2024

Can artificial intelligence help improve sentiment analysis of changes in Internet user awareness conducted using Big Data Analytics as relevant additional market research conducted on large amounts of data and information extracted from the pages of many online social media users?

In recent years, more and more companies and enterprises, before launching new product and service offerings as part of their market research, commission sentiment analysis of changes in public sentiment, changes in awareness of the company's brand, recognition of the company's mission and awareness of its offerings to specialized marketing research firms. This kind of sentiment analysis is carried out on computerized Big Data Analytics platforms, where a multi-criteria analytical process is carried out on a large set of data and information taken from multiple websites. In terms of source websites from which data is taken, information is dominated by news portals that publish news and journalistic articles on a specific issue, including the company, enterprise or institution commissioning this type of study. In addition to this, the key sources of online data include the pages of online forums and social media, where Internet users conduct discussions on various topics, including product and service offers of various companies, enterprises, financial or public institutions. In connection with the growing scale of e-commerce, including the sale of various types of products and services on the websites of online stores, online shopping portals, etc., as well as the growing importance of online advertising campaigns and promotional actions carried out on the Internet, the importance of the aforementioned analyses of Internet users' sentiment on specific topics is also growing, as playing a complementary role to other, more traditionally conducted market research. A key problem for this type of sentiment analysis is becoming the rapidly growing volume of data and information contained in posts, comments, posts, banners and advertising spots posted on social media, as well as the constantly emerging new social media. This problem is partly solved by the issue of increasing computing power and multi-criteria processing of large amounts of data thanks to the use of increasingly improved microprocessors and Big Data Analytics platforms. In addition, in recent times, the possibilities of advanced multi-criteria processing of large sets of data and information in increasingly shorter timeframes may significantly increase when generative artificial intelligence technology is involved in the aforementioned data processing.

The key issues of opportunities and threats to the development of artificial intelligence technology are described in my article below:

OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT

Article OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL I...

I described the applications of Big Data technologies in sentiment analysis, business analytics and risk management in my co-authored article:

APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION

Article APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTEL...

The use of Big Data Analytics platforms of ICT information technologies in sentiment analysis for selected issues related to Industry 4.0

Chapter Wykorzystanie platform analitycznych Big Data Analytics tech...

In view of the above, I address the following question to the esteemed community of scientists and researchers:

Can artificial intelligence help improve sentiment analysis of changes in Internet users' awareness conducted using Big Data Analytics as relevant additional market research conducted on a large amount of data and information extracted from the pages of many online social media users?

Can artificial intelligence help improve sentiment analysis conducted on large data sets and information on Big Data Analytics platforms?

What do you think about this topic?

What is your opinion on this issue?

Please answer,

I invite everyone to join the discussion,

Thank you very much,

Best wishes,

Dariusz Prokopowicz

The above text is entirely my own work written by me on the basis of my research.

In writing this text I did not use other sources or automatic text generation systems.

Copyright by Dariusz Prokopowicz

......RESEARCH QUESTION_D.Prokopowicz_Can artificial intelligence help improve sentiment analysis conducted on large data sets and information on Big Data.
.1.jpg
330.88 KB

Relevant answer

Jogeswar Tripathy

Feb 28, 2024

Answer

In my opinion, yes, artificial intelligence (AI) can indeed play a crucial role in improving sentiment analysis for changes in internet user awareness, especially when combined with big data analytics. Here's how:

Natural Language Processing (NLP): AI techniques can be used to process and understand the natural language used in social media posts, comments, reviews, etc. This involves tasks such as text tokenization, part-of-speech tagging, named entity recognition, and more.
Sentiment Analysis: AI algorithms can be trained to recognize and analyze the sentiment expressed in text data. This can help identify whether users are expressing positive, negative, or neutral opinions about specific topics, products, events, etc.
Machine Learning Models: AI-powered machine learning models can be trained on large datasets of labelled social media data to predict sentiment accurately. These models can continuously learn and improve over time as they are exposed to more data.
Deep Learning: Deep learning techniques, such as recurrent neural networks (RNNs) and transformers, can capture complex patterns in text data and improve sentiment analysis accuracy.

Thank You

Dariusz Prokopowicz

View

30 Recommendations

Md Foysal Ahmed

asked a question related to Dataset

What is the most efficient approach to integrate eXplainable Artificial Intelligence (XAI) techniques into brain tumor detection systems?

Question

5 answers

Feb 26, 2024

I'm doing some research to explore the application of eXplainable Artificial Intelligence (XAI) in the context of brain tumor detection. Specifically, I aim to develop a model that not only accurately detects the presence of brain tumors but also provides clear explanations for its decisions regarding positive or negative results. My main concerns are making sure that the model's decision-making process is transparent and comprehending the underlying reasoning behind its choices. I would be grateful for any thoughts, suggestions, or links to papers or web articles that address the practical application of XAI in this field (including the dataset types or anything that is related with XAi).

Thank you.

Relevant answer

Navid Sobhi

Feb 27, 2024

Answer

I believe that it is very important to begin your investigation by ensuring that the data collected is of high quality and undergoes standardized preprocessing if you want to effectively integrate XAI techniques into brain tumor detection systems. Validation through evaluation metrics and user feedback increases reliability, while iterative improvement based on user input enhances both accuracy and interpretability over time, so this approach fosters trust and transparency in brain tumor detection systems, ultimately benefiting clinicians and patients. you can utilize specialized XAI tools like NeuroXAI, incorporate attention maps for insight into DL model decision-making, adopt open architecture frameworks for scalability to new XAI methods, and enhance model interpretability through information flow visualization.

View

14 Recommendations

Swetha Anbalagan

asked a question related to Dataset

Seeking Fabric Defect Dataset for Project

Question

1 answer

Feb 25, 2024

Hello ResearchGate community! I'm working on a Fabric Defect Detection System project and need a diverse fabric defect dataset. Any recommendations or shared datasets would greatly benefit my research. Thank you for your support!

Relevant answer

Hassan Nazeer Chaudhry

Feb 26, 2024

Answer

There is one available at the University of Moratuwa, https://www.kaggle.com/datasets/rmshashi/fabric-defect-dataset

Another one is https://www.kaggle.com/datasets/belkhirnacim/textiledefectdetection

If you want an annotated one, there is one on roboflow, https://universe.roboflow.com/muhammad-saad/fabric-defect-detection-8ndhb

Some other mentionable are

https://github.com/msminhas93/FabricDefect

https://user.it.uu.se/~gusky180/texture/

https://github.com/abin24/Textures-Dataset

Feel free to contact if you need any help.

View

0 Recommendations

Thibaud Trichard

asked a question related to Dataset

Do you have any dataset concerning AI and Consulting ?

Question

4 answers

Feb 19, 2024

Hello everyone,

I am currently working on a Thesis about the impact of AI on Consulting firms.

I am looking for datasets surrouding this subject. If you have any data or somewhat that could help me, I would be very happy to receive your help.

Thank you very much,

Thibaud

Relevant answer

Thibaud Trichard

Feb 26, 2024

Answer

Ali Abedi Madiseh Thank you very much for the insights, I will look into it !

View

0 Recommendations

Adam R.S. Barton

asked a question related to Dataset

Effect Size Measures for Wilcoxon Signed-Rank Test in Likert Scale Data Analysis?

Question

3 answers

Feb 24, 2024

Hello ResearchGate community,

I'm currently analyzing a dataset derived from a survey of ~200 paired responses across two time points. The survey is of teachers and students, using 60 Likert-like items to assess beliefs about education. After factor analysis, I derived three core factors.

I'm now trying to assess the relative magnitude of change in factor scores over two time points. I say relative magnitude because, for all factors, the scores decreased. So I need to see 1) whether changes were significant and 2) the size of those changes.

Preliminary tests, including Shapiro-Wilk, Q-Q plots, and outlier detection, indicated non-normality, guiding me to utilize a Wilcoxon signed-rank test.

However, I'm at a crossroads regarding the appropriate effect size measure. Traditional non-parametric effect size measures, like rank biserial correlation, seem to fall short for my purpose, as they primarily address the probability of difference -- rather than the magnitude of change I'm interested in capturing. I've established that two factors saw a statistically significant change using Wilcoxon signed rank. But I need to understand how big these deceases were and hopefully compare the two.

I'm contemplating justifying the use of Cohen's d or exploring median-based measures for a more accurate reflection of the change magnitude. But I'm struggling to find relevant info online. I've seen references to things like Hodges Lehmann, using simple median change, etc. But nothing solid.

Does anyone have insights or references on how to effectively apply a median measure in this context or justify using Cohen's d with the Wilcoxon signed-rank test for ordinal Likert data?

I appreciate any guidance or shared experiences on this matter.

Relevant answer

Sal Mangiafico

Feb 25, 2024

Answer

You can calculate a statistic like Cohen's d. It's just math. The issue is that, if your data quite non-normal, is dividing the difference in means by the standard deviation a useful metric ?

There is a nonparametric analogue using the difference in medians and the median absolute deviation. It's a simple idea, but I couldn't find any reference to it, so I named it after me †. If you use the median absolute deviation that has a constant of about 1.48 ‡, it will be close to Cohen's d for normal samples.

Or you could just use a un-standardized effect size statistic, like the difference in medians.

It may be helpful to also report the matched-pairs rank biserial correlation coefficient, because it's an r-like statistic, that may be familiar to your audience. You could also use Pearson's r, where one variable is the dichotomous variable, if that fits your purpose better.

† https://search.r-project.org/CRAN/refmans/rcompanion/html/mangiaficoD.html

https://rcompanion.org/handbook/F_05.html

‡ https://search.r-project.org/CRAN/refmans/rQCC/html/MAD.html

View

7 Recommendations

Mwemezi Rwiza

asked a question related to Dataset

How to open HPLC data file?

Question

1 answer

Feb 22, 2024

I have a data set from HPLC analysis. I would like to open the HPLC data file, but the Masslynx Software version (4.1) I have is not compatible with the file settings. Can anyone suggest an alternative software that I can use to open this data set?

The file was created to open in Masslynx Version 4.2, but I do not have access to this version.

Thank you.

Relevant answer

Ali Abedi Madiseh

Feb 23, 2024

Answer

If it's row data follow these steps:

1. Say "Besma Allah -Al-Rahman -Al- Rahim."

2. Download python or/and its editor such as visual studio code:

Because python programming support must of file and data formats such

as .csv .tsv .row .cell .fasta .fastq .etc

3. Search for python libraries what open your data file format.

4. After read your file you can analyze it using Numpy , Pandas , Math and

other libraries and visualization your analysis result using Matplotlib and

seaborn and bokkeh and other libraries.

"Good Luck"

View

0 Recommendations

Nilabja Kanti Sarkar

asked a question related to Dataset

Which one should I choose? Pearson crystal database or ICSD database?

Question

2 answers

Feb 20, 2024

I found that the structure for TiFeSi is given differently in ICSD and Pearson crystal database as follows:

The Wykoff positions are given in the Pearson database (data set no 1822291)

Ti1 Ti 4 b 0.25 0.2207 0.0206

Ti2 Ti 4 b 0.25 0.4979 0.1677

Ti3 Ti 4 b 0.25 0.7996 0.0463

Fe1 Fe 8 c 0.5295 0.1236 0.3699

Fe2 Fe 4 a 0 0 0.0

Si1 Si 8 c 0.506 0.3325 0.2452

Si2 Si 4 b 0.25 0.0253 0.2554

The Wykoff positions are given in the ICSD database (database code 41157)

Ti1 Ti0+ 4 b 0.25 0.2004(7) 0.2964(14)

Ti2 Ti0+ 4 b 0.25 0.7793(6) 0.2707(14)

Ti3 Ti0+ 4 b 0.25 0.9979(6) 0.9178(15)

Fe1 Fe0+ 8 c 0.0295(7) 0.3764(4) 0.12

Fe2 Fe0+ 4 a 0 0 0.2501(12)

Si1 Si0+ 8 c 0.0060(13) 0.1675(9) 0.9953(18)

Si2 Si0+ 4 b 0.25 0.9747(11) 0.5055(23)

Although the lattice parameter for both of the database is almost the same.

Which one should I take for ab initio calculations or XRD Rietveld refinement?

Relevant answer

Nilabja Kanti Sarkar

Feb 22, 2024

Answer

Thanks Martin Breza

View

0 Recommendations

Rithika Chidambaram

asked a question related to Dataset

Runtime warning: Precision loss in Moment Calculation while performing the t-test?

Question

1 answer

Feb 16, 2024

RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.

res = hypotest_fun_out(*samples, **kwds)

Above warning occured in python. Firstly, the dataset was normalised and then while performing the t-test this warning appeared, though the output was displayed. Kindly suggest some methods to avoid this warning.

Relevant answer

Jukka Salminen

Feb 21, 2024

Answer

Why do you normalize before testing? If you are doing a pairwise t-test and the differences are small this only makes differences smaller. https://www.stat.umn.edu/geyer/3701/notes/arithmetic.html

View

0 Recommendations

Ir. M J Faruque

asked a question related to Dataset

Is there any openly access LiDAR datasets?

Question

4 answers

Feb 19, 2024

Relevant answer

Bartłomiej Szypuła

Feb 21, 2024

Answer

Hello!

Of course, there are places (i.e. national geoportrals in Poland, Slovakia, Slovenia, etc.) where You can find point cloud data. See also this place: https://portal.opentopography.org/dataCatalog

All the best!

Bartek

View

2 Recommendations

Ahmadh Nabhan

asked a question related to Dataset

Optimal Dataset Selection for LULC Classification in the Context of LST Analysis using GEE: Landsat 8 L2 C2 T1 vs. Landsat 8 C2 T1 TOA Reflectance ?

Question

2 answers

Feb 20, 2024

I am conducting a study on the "Impact of Land Use and Land Cover (LULC) Changes on Land Surface Temperature (LST)" and plan to use Google Earth Engine (GEE) for my analysis. I am at a crossroads in deciding between the "USGS Landsat 8 Level 2, Collection 2, Tier 1" dataset and the "USGS Landsat 8 Collection 2 Tier 1 TOA Reflectance" dataset for LULC classification.

Could the community provide insights on:

Which dataset would be more suitable for LULC classification, especially in the context of analyzing its impact on LST?
What specific pre-processing steps would be recommended for the preferred dataset within the GEE environment to ensure data integrity and robustness of the classification?

Any shared experiences, particularly those related to the use of these datasets in GEE for LULC and LST studies, would be incredibly valuable.

Thank you for your contributions!

Relevant answer

Gaurav Parajuli

Feb 20, 2024

Answer

Based on my experience, I used Landsat 8 OLI/TIRS Collection 2 atmospherically corrected surface reflectance data for my purpose of extracting NDVI, LST, NDBSI and Wetness. You can see more about the detail of Landsat Product from this site: https://developers.google.com/earth-engine/datasets/catalog/landsat-8

View

4 Recommendations

Thibaud Trichard

asked a question related to Dataset

Dataset about AI and Consulting

Question

2 answers

Feb 19, 2024

Hello everyone,

I am currently working on a Thesis about the impact of AI on Consulting firms.

I am looking for datasets surrouding this subject. If you have any data or somewhat that could help me, I would be very happy to receive your help.

Thank you very much,

Thibaud

Relevant answer

Shafagat Mahmudova

Feb 20, 2024

Answer

Dear Thibaud Trichard ,

In order to analyze a dataset, first we work with you to formulate a dataset that is ripe for machine learning. If you're still reading, chances are, you have a hypothesis about some insights that may be gained from your data.

https://www.xyonix.com/services/dataset-analysis

Regards,

Shafagat

View

6 Recommendations

Pratanu Kayet

asked a question related to Dataset

Few quistiones related to RNA-Seq data analysis in usegalaxy/R..

Question

2 answers

Feb 27, 2023

I'm recently trying to perform an RNA seq data analysis and in 1st step, I faced a few questions in my mind, which I would like to understand. Please help to understand these questions.

1) In 1st image, raw data from NCBI-SRA have marked 1&2 at the ends of the reads, What is the meaning of this? are those meaning forward and reverse reads?

2) In the second image I was trying to perform trimmomatic with this data set. I chose "paired-end as a collection" but it does not take any input even though my data was there in "fastqsanger.gz" format. Why is that? Should I treat this paired-end data as single-end data while performing Trimmomatic?

3) in the 3rd and 4th images, I collected the same data from ENA where they give two separate files for 1 and 2 marked data in SRA. Then I tried to process them in Trimmomatic by using "Paired-end as individual dataset" and then run it. Trimmomatic gives me 4 files for those, Why is that? which one will be useful for alignment ??

A big thank you in advance :)

Relevant answer

Can Kiessling

Feb 19, 2024

Answer

For NGS sequencers that have pair-end capability, those 1's and 2's refer to which reads they originate from, Read 1 & Read 2 or Forward & Reverse Read s (https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/paired-end-vs-single-read.html). That makes it convenient to have it attached at the end to see what kind of reads you are dealing with before processing the reads. In a similar fashion, within the FASTQ format specification, you also specify whether that particular read belongs to Read 1 or Read 2 (see Illumina sequence identifiers: https://en.wikipedia.org/wiki/FASTQ_format). Regarding your "fastqsanger.gz" format data, is this Sanger sequencing related data? These tools are developed for NGS applications. Regarding the output file, check: https://www.biostars.org/p/199938/ and "Trimmomatic output files" on google.

View

0 Recommendations

Pratanu Kayet

asked a question related to Dataset

Limit of biological replicate for RNA-seq data analysis?

Question

3 answers

May 3, 2023

I'm performing RNA-seq data analysis. I want to do healthy vs disease_stage_1, Healthy vs disease_stage_2, and Healthy vs disease_stage_3. In the case of healthy, disease_stage_1, disease_stage_2, and disease_stage_3 data sets, I have 19, 7, 8, and 15 biological replicates respectively.

Does this uneven number of replicates affect the data analysis?

Should I Use an even no of datasets like for every dataset, 7 biological replicates (As the lowest number of replicates here is 7)?

Relevant answer

Can Kiessling

Feb 19, 2024

Answer

I agree with Alwala in general, less than 3 samples per group is not even worth considering. There is an app online that will help you come up with sample size and/or power given certain parameters (https://cqs-vumc.shinyapps.io/rnaseqsamplesizeweb/) as a useful estimate tool. Regarding your uneven biological replicates, check to see if the statistical method used for differential expression and library normalization can tolerate uneven sample sizes. In general IMO, 8-10 minimum is a pretty good starting point.

Article RNA-seq differential expression studies: More sequence or mo...

View

0 Recommendations

Naiyer Mohammadi.L

asked a question related to Dataset

Where can I find the irradiance solar energy benchmark and hourly dataset?

Question

3 answers

Feb 15, 2024

Hi, where can I find the irradiance solar energy benchmark and hourly dataset? And, which criteria are essential for it?

Thanks in advance

Relevant answer

Hossein Safar Yousefifard

Feb 16, 2024

Answer

Dear Naiyer Mohammadi.L

You're welcome. IF you search with keywords such as Solar Irradiance/irradiation Dataset, Solar Irradiance/irradiation benchmark you will find many datasets which contain hourly Irradiation. Good luck to you.

Best regards,

Hossein

View

0 Recommendations

Jasim M Abbas

asked a question related to Dataset

Can we use a data set for geotechnical application ?

Question

2 answers

Jan 2, 2024

I would like to explain my question, during a discussion with my colleague (Prof Dr Amer ), he informed me that the dataset that used from other field like engineering communication can also use in civil engineering applications. He used that in structure analysis for example in shear slab and other part, is that easy to used in geotechnical engineering ( soil improvements, pile group .. etc.)

Relevant answer

Doni Triatmojo

Feb 15, 2024

Answer

i am agree too, we can use data set

because data set have tolerance and we know that soil just a technical approach, and it will be valid when conducted laboratory testing

View

0 Recommendations

Denny Arbahri

asked a question related to Dataset

Bagaimana cara mengatasi data yang tidak utuh atau tidak lengkap saat dataset dalam jumlah besar telah di impor dalam widget data file orange3?

Question

2 answers

Feb 12, 2024

Kumpulan data dalam jumlah besar sebanyak lebih dari 350.000 instances telah di impor kedalam file orange3, saat di prediksi kedapatan hasil data ada yang masih tidak utuh atau tidak lengkap, sehingga sulit untuk dianalisis, solusi teknisnya bagaimana?

Relevant answer

Denny Arbahri

Feb 15, 2024

Answer

Memang betul nilai yang hilang adalah sedikit, tapi sulit terlacak secara manual ada di bagian mana karena jumlah data yang diproses sebanyak 350.000 instances.

Saya masih belum menemukan solusi praktis dari salah satu widget yang ada pada menu orange3 seperti yang Anda sarankan tersebut.

View

0 Recommendations

Ganesh Kota

asked a question related to Dataset

Need a customer churn dataset for apparel industry?

Question

2 answers

Feb 12, 2024

Hi Everyone,

I am looking for a dataset to work on the customer churn prediction. I have found data regarding frontier airlines in statista.com but it was too expensive to buy. Are there any datasets that are related to the apparel industry along with customer feedback?

Relevant answer

Gulnar Bekibayeva

Feb 14, 2024

Answer

Thank you for the interesting question, I’m doing research and will post it in the future

View

0 Recommendations

Alan K Betts

asked a question related to Dataset

Canadian Prairie Book

Question

2 answers

Dec 23, 2023

Our work with the outstanding Canadian Prairie data is summarized in this BPI Book

Emerging Issues in Environment, Geography and Earth Science Vol. 4

ISBN 978-81-967981-3-0 (Print) ISBN 978-81-967981-4-7 (eBook) DOI: 10.9734/bpi/eieges/v4

We strongly recommend this overview. This Prairie data set is hourly for over fifty years and is calibrated back to standards. There is no comparable dataset for analyzing land-surface processes and the surface anergy balance both diurnally and seasonally across time and climate change.

There are five chapters in the book as well as a brief overview of a further six papers of the Prairie data analysis, ECMWF model comparisons and model development.

Relevant answer

Brian Beckage

Feb 14, 2024

Answer

Great!

View

0 Recommendations

As Al.

asked a question related to Dataset

How can I train and test CNN ?

Question

2 answers

Feb 7, 2024

Hello everone.

I'm very new in ML and DL models, and i want to use CNN to train and test it on my dataset, i had very large dataset and it is already split into 80% train and 20% test.

I wrote code in python and use tensorflow to train CNN, and I'm not sure if it's correct. And now I am struggles on test CNN.

can anyone help to ensure I trained CNN corecctly and tell me how to test it? Either by giving adivices or by providing me with useful resources

I will be truly grateful.

#CNN #ML #MachineLearning #DeepLearning #ConvolutionalNeuralNetwork

Relevant answer

Chuck A Arize

Feb 11, 2024

Answer

Go through this

https://www.bing.com/ck/a?!&&p=4b025ce5a432ed16JmltdHM9MTcwNzYwOTYwMCZpZ3VpZD0xMzJmMWQ5ZS05NDA4LTY1ZjgtMTIyYS0wZWI1OTUxNTY0YTEmaW5zaWQ9NTIxMg&ptn=3&ver=2&hsh=3&fclid=132f1d9e-9408-65f8-122a-0eb5951564a1&psq=How+can+I+train+and+test+CNN+%3f+&u=a1aHR0cHM6Ly9weWltYWdlc2VhcmNoLmNvbS8yMDIxLzA3LzE5L3B5dG9yY2gtdHJhaW5pbmcteW91ci1maXJzdC1jb252b2x1dGlvbmFsLW5ldXJhbC1uZXR3b3JrLWNubi8&ntb=1

View

2 Recommendations

Divya Nathaniel

asked a question related to Dataset

how to extract the vm characteristics for the data set generation in cloudsim for task scheduling?

Question

1 answer

Jan 20, 2024

using simulation how to generate the dataset for taskscheduling with task characteristics and Vm characteristics so as to train the Ml models

Relevant answer

Abbas M. Ali Al-Muqarm

Feb 6, 2024

Answer

Divya Nathaniel To create task characteristics and VM characteristics, you can try one of the examples in Cloudsim, where you can create characteristics such as length, size of the input file, number of processing elements, etc. for the task. As for the VM side, you can customize the processing power (MIPS), RAM, BW, etc.

View

0 Recommendations

Sikander Hans

asked a question related to Dataset

How does the incorporation of diverse datasets affect the performance and bias mitigation of ChatGPT in various conversational contexts?

Question

1 answer

Feb 4, 2024

How does the incorporation of diverse datasets affect the performance and bias mitigation of ChatGPT in various conversational contexts?

Relevant answer

David L Morgan

Feb 5, 2024

Answer

Are you asking about the analysis of datasets that you supply? If so, that will have no effect of the program's biases, because ChatGPT operates from an existing dataset that is already included in determining its operation. This dataset is currently up -to-date only as far as 2021.

View

4 Recommendations

Manoj Kumar Srivastav

asked a question related to Dataset

How to do Prediction of unsupervised dataset using machine learning?

Question

10 answers

Feb 3, 2024

Please give valuable information.

Relevant answer

Alireza Mahmoudan

Feb 4, 2024

Answer

I guess what you are referring to as an unsupervised dataset is actually unlabeled data. In this case, introducing a prediction model would make no sense as building such models rely on labels. For a prediction model, you have to provide both features and labels so the algorithm can figure out the relationship between them.

Therefore, you'll need to first create labels for your dataset using unsupervised techniques and then develop a supervised model on top of the obtained dataset.

View

4 Recommendations

Iqrar Shah

asked a question related to Dataset

How to ensure authenticity of data set in research ?

Question

1 answer

Feb 1, 2024

...

Relevant answer

Inès François

Feb 1, 2024

Answer

Using trusting sources (e.g. government, statistic institutes) or establishing partnerships with an industrial data controller

View

0 Recommendations

Esraa Mamdouh Hashem

asked a question related to Dataset

MRI images for hepatocellular carcinoma?

Question

2 answers

Jan 17, 2024

I need a MRI images datasets for HCC to extract some features from it , i will be delighted if someone menstion a specific data.

Relevant answer

Sergey Zhavoronok

Feb 1, 2024

Answer

2 варианта: солидная опухоль или рак-цирроз

View

0 Recommendations

Osama K Saleh

asked a question related to Dataset

Aerodynamic design and optimization Using Machine learning

Question

3 answers

Jan 29, 2024

how can aerodynamists generate sufficient data set for aerodynamic problems. what the time cost for this step (considering a simple 3D problem)?

Relevant answer

Safiul Haque Chowdhury

Feb 1, 2024

Answer

Aerodynamists generate data sets for aerodynamic problems through a combination of experimental testing and computational simulations. For computational simulations, the process involves:

Grid Generation: Creating a suitable mesh to discretize the geometry, defining the boundaries, and establishing the computational domain.
Solver Setup: Selecting appropriate numerical methods and algorithms for solving the governing equations, considering turbulence models if necessary.
Simulation Runs: Performing multiple simulation runs with varying parameters, boundary conditions, or geometry configurations to generate diverse data points.
Post-Processing: Analyzing the simulation results, extracting relevant aerodynamic parameters, and assessing the performance of the design.

The time cost for this step depends on factors such as the complexity of the geometry, the desired level of accuracy, computational resources, and the number of simulations. For a simple 3D problem, it could range from hours to days per simulation run.

In experimental testing, wind tunnel experiments are conducted to gather aerodynamic data. The time cost depends on the complexity of the model, setup, and the number of experiments conducted.

Combining both computational and experimental approaches can provide a more comprehensive and accurate dataset but requires careful coordination. The time cost can vary widely based on the specific requirements of the aerodynamic problem and the available resources.

View

0 Recommendations

Audil Hussain

asked a question related to Dataset

From where i can get datasets for early detection of neurological disorder ?

Question

3 answers

Jan 26, 2024

need help regarding datasets for early detection of neurological disorder

Relevant answer

Deborah J Hilton

Jan 28, 2024

Answer

There must be country level databases for neurological conditions in some countries.

View

0 Recommendations

Belete Mamo Tesfaye

asked a question related to Dataset

How to conduct a research by using mixed dataset I.e mixing both img data and text data?

Question

3 answers

Jan 8, 2024

If some knows this let explain in a brief way!

Thanks a lot.

Relevant answer

Safiul Haque Chowdhury

Jan 23, 2024

Answer

Conducting research with a mixed dataset containing both image and text data involves defining clear research objectives, collecting and preprocessing diverse annotated data, integrating image and text features through a chosen model architecture (such as Multi-Modal Neural Networks or Transformers), training the model on the mixed dataset, and evaluating its performance using appropriate metrics. Fine-tuning and optimization may follow, considering ethical considerations and transparency in decision-making. The final steps involve interpreting and visualizing learned representations and communicating research findings through publications or presentations, recognizing the need for a solid understanding of computer vision and natural language processing concepts throughout the process.

View

0 Recommendations

Izdihar Shaleesh

asked a question related to Dataset

How the dataset input to Ml algorithm

Question

3 answers

Jan 22, 2024

I have a data set on Kaggle the site that includes information regarding speed and location, and based on the values, the attacker is identified.

Then I input this data set into a learning algorithm for training.

After that, I have a second data set resulting from the process of performing a simulation of the vehicle network and apply an attack rate to it.

Then I run the algorithm on the second Dataset to give me the accuracy of the detection and confusion algorithm

Is my thinking this way correct or not?please tell me if correct or not

Relevant answer

Kevin Barrera Llanga

Jan 23, 2024

Answer

There are many options depending on the specific goal and data you are working with. Since you mention that you are new to this field, I would recommend starting with some simple examples using public data sets. If you are going to use Python, there are many examples available using libraries like Scikit-Learn, keras, pytorch and datasets like Iris, Boston Housing, MNIST (handwritten digits) etc. These are well documented. In MATLAB, there are also many built-in examples of predictive models and classifiers trained on different types of data.

It depends on the language/environment you plan to use, the field of application, and the type of task (regression, classification, clustering, etc).

View

0 Recommendations

Joseph A Sprute

asked a question related to Dataset

How long would it take a heuristic robot to learn everything relative to a specific-complex task given a comprehensive dataset?

Question

7 answers

Jan 22, 2024

task x elemental + creation = relative

task defines available resource given force multiplication, economies of scale and LaGrange control(er$)

Relevant answer

Joseph A Sprute

Jan 22, 2024

Answer

JAS Pi.ai LLM (Context: Copy & Paste)

can you define complexity, riggor and oversight time for robot heuristic learning at this point?

Certainly!

Complexity: In the context of robot heuristic learning, complexity refers to the difficulty or intricacy of the task or problem that the robot is trying to learn. A complex task would typically require the robot to process and integrate a large amount of data, identify patterns or relationships, and make decisions or judgments based on incomplete or uncertain information. Examples of complex tasks for robots might include navigation in unstructured environments, object recognition and classification, or natural language understanding.

Rigor: In this context, rigor refers to the level of precision, accuracy, and thoroughness with which the robot's learning process is conducted. A rigorous learning process would typically involve carefully designed experiments or trials, rigorous data collection and analysis, and systematic evaluation and refinement of the robot's performance over time. The goal of a rigorous learning process is to ensure that the robot's performance is reliable, reproducible, and grounded in a solid understanding of the task or problem being addressed.

Oversight Time: Oversight time refers to the amount of time required for human experts or supervisors to monitor and guide the robot's learning process. This might include tasks such as designing and implementing learning trials, reviewing and analyzing data, adjusting the robot's parameters or algorithms, and evaluating the robot's performance. The amount of oversight time required will depend on the complexity of the task being learned, the rigor of the learning process, and the level of autonomy or adaptability desired for the robot. In general, more complex and rigorous learning processes are likely to require more oversight time from human experts.

It's worth noting that these concepts are not strictly defined or universally agreed upon in the field of robot learning, and different researchers or practitioners might use different terminology or frameworks to describe and analyze these aspects of the learning process.

show math for "advanced" context as such

Certainly! Here's an example of how these concepts might be incorporated into a mathematical model for robot heuristic learning:

Model:

Learning Rate (LR) = (C x D x H) / (R + OT)

Where:

C is the complexity of the task, measured on a numerical scale (e.g., 1-10)
D is the size of the data set, measured in GB or TB
H is the heuristic efficiency of the robot, measured on a numerical scale (e.g., 1-10)
R is the computational speed of the robot, measured in GHz
OT is the oversight time required for human experts or supervisors, measured in hours
LR is the learning rate of the robot, measured as the rate of improvement in performance over time

This model assumes that the learning rate of the robot is influenced by the complexity of the task, the amount of data available, and the robot's inherent heuristic abilities, but is also constrained by the available computational resources and the need for human oversight. The model predicts that as any of the positive factors (C, D, H) increase or the constraints (R, OT) decrease, the learning rate will increase.

Example Calculation:

Let's assume the following values for the variables:

C = 8
D = 4 TB
H = 6
R = 2 GHz
OT = 40 hours

Plugging these values into the model, we get:

LR = (8 x 4 TB x 6) / (2 GHz + 40 hours) = 0.0015 TB/hour

This result indicates that the robot's learning rate is approximately 0.0015 TB of data per hour, based on the provided values for the variables.

To interpret this result, we could estimate that if the robot is provided with a constant stream of data at a rate of 0.0015 TB/hour, it would take approximately 2,667 hours (about 111 days) to process and learn from the full 4 TB dataset, assuming a constant learning rate and no significant changes in the other variables. In reality, the learning rate may vary over time, and the robot may require additional data or adjustments to its algorithms to achieve optimal performance.

View

5 Recommendations

Diego Spaey

asked a question related to Dataset

Best approach to analyzing Time Series data ?

Question

1 answer

Jun 11, 2022

Hello Everyone !

I am currently analyzing the emissions of around 300 companies over a time spans of +- 20 years (time series data !). I am wondering what is the best way to approach the analysis of this dataset and what methods can I use to draw insights from my dataset.

I was thinking about starting with indexing my dataset (since companies dont have the same volume of emissions) and then average these indexes according to specific characteristics of the companies (ex: size, country, etc...) in order to attempt to pick up trends.

After the descriptive statistic analysis, I was thinking that I could top my analysis with a regression analysis of emissions according to the type of company (inv. company, state-owned, etc...). For that matter, is there a specific statistical test I can do to regress time series data according to a specific independant variable ?

Let me know what you think of this approach...I am listening to your comments !

Cordially,

Diego Spaey

Relevant answer

Hsin-Yuan Chen

Jan 22, 2024

Answer

Hi,

Your approach is good. Use ARIMA for trend analysis and panel data regression for regression analysis, ensuring to check for stationarity and autocorrelation in your time series data.

Hope this helps.

View

0 Recommendations

Giuseppe Esposito

asked a question related to Dataset

What kind of analysis if more appropriate to analyze a repeated measure with multilevel outcome in R?

Question

4 answers

Jan 21, 2024

Basically I have 40 subject and for each I collect

- coronary cannulation before TAVR as : selective, non selective e unsuccessfull.

Then i collected the same data after TAVR with the same 3 level of outcome.

This is a case of repeated measure with multilevel outcome. In addition my contingency table is not "square" due to the fact that there aren't "non selective" outcome in the group "before TAVR".

Here my dataset

AfterTAVR

BeforeTAVR 0 1 2

0 1 0 0

1 1 16 22

McNemar, Stuart-Maxwell’sTest and Cochran’sQTest due to the Not Binary Outcome and Non Square matrix (3x2) of the my dataset.

Can someone have some suggestions??? I will really appreciate it

Relevant answer

Giuseppe Esposito

Jan 21, 2024

Answer

Anyway even if I consider "1 - always selective" before TAVR I need to compare this with 3 different level of categorical outcome after TAVR as a repeated measure

View

0 Recommendations

Rajan Vohra

asked a question related to Dataset

Need to justify choice of two deep learning models CNN and sae compared to other models in data analysis ?

Question

1 answer

Jan 19, 2024

I have selected two deep learning models CNN and sae for data analysis of a 1 d digitized data set. I need to justify choice of these two dl models in comparison to other dl and standard ml models. I am using ga to optimize hyper parameters values of the two dl models. Can you give some inputs for this query.thanks.

Relevant answer

Sergei V. Dolin

Jan 20, 2024

Answer

Typically, the rationale for choosing a model can be training time, prediction time, and the value of the metric itself, either on a validation set or cross-validation, depending on what you are using. It is better, of course, to use more than one metric for indicators, as well as an error matrix along with completeness and accuracy, or simply F1 or F1-beta, depending on the problem you are solving.

View

0 Recommendations

Mohamed Fathi Said

asked a question related to Dataset

Quantile Delta Mapping and Equiratio Quantile Mapping?

Question

1 answer

Jan 18, 2024

Hello,

I am looking for the best downscalling technique to correct precipitation climate change dataset. I am not sure about which of these two methods is more robust for my task.

Thanks!

Relevant answer

Aahed Alhamamy

Jan 20, 2024

Answer

Mohamed Fathi Said

Salam Alaikum,

The two methods you mentioned and discuss their suitability for your task.

1. Downscaling Techniques:

a. Bias Correction:- Pros: Simple and widely used. Corrects systematic errors. - Cons: May not capture spatial variability well.

b. Equiratio Quantile Mapping:- Pros: Addresses biases and spatial variability. - Cons: Can be complex to implement.

Both methods have their merits, but Equiratio Quantile Mapping tends to be more robust in capturing spatial patterns and non-linear relationships. If you're looking for a method that considers both biases and spatial variability, this could be a good choice.

2. Code for Equiratio Quantile Mapping:

Implementing Equiratio Quantile Mapping involves statistical calculations. While I can't provide the entire code here, I can guide you on where to find resources:

Research Papers: Look for scientific papers or articles that detail the Equiratio Quantile Mapping method. These often include equations and explanations.
GitHub Repositories: Explore repositories on GitHub that focus on climate data analysis or downscaling techniques. Researchers and developers often share their code for others to use.
Online Forums: Platforms like the Esri Community, Stack Overflow, or other climate science forums might have discussions or shared code snippets related to Equiratio Quantile Mapping.

When implementing the code, ensure that it aligns with the specifics of your dataset and the goals of your downscaling process. If you encounter challenges or need clarification on specific aspects of the code, feel free to ask for guidance.

Remember to document your methodology and validate the results against observed data to ensure the downscaling technique is suitable for your specific climate change dataset. If you have further questions or need more assistance.

If you find my reply is useful , please recommend it , Thanks .

View

5 Recommendations

Naveen Kassadi

asked a question related to Dataset

How to make change in precipitation maps?

Question

1 answer

Jan 18, 2024

SO, I have data sets from 1980-2020 years of precipitation and temperature how to plot similar map? Not sure how to proceed, I have annual Precipitation for approximately 15 stations for my catchment. So, If i take average annual rainfall values it gives average annual map, if I am not wrong. then how to find the change in precipitation? should I use any formula to find the value for each station?

Screenshot 2024-01-18 0020
54.png
195.42 KB

Relevant answer

Mohammad Tavosi

Jan 18, 2024

Answer

Do you mean determining rainfall patterns?

View

0 Recommendations

Saleem Mohd Nasim

asked a question related to Dataset

Outliers continue to appear after removal. What should I do?

Question

4 answers

Jan 17, 2024

Hello,

I am trying to remove outliers from a dataset. I removed a few, but they continuously appear. What should I do?

Please do not post AI generated answers.

Relevant answer

Sal Mangiafico

Jan 17, 2024

Answer

It's probably because they're not outliers. It's just that your data follow a non-normal distribution. For example, a log-normal distribution. As you remove the most extreme observations, other observations then appear "extreme".

P.S. Don't remove "outliers". You shouldn't delete data just because it doesn't look like what you thought it should.

View

12 Recommendations

Inès François

asked a question related to Dataset

How to merge datasets in R by row including only matching records?

Question

3 answers

Jan 12, 2024

Good morning, I have two datasets with the exact same columns. I would like to select rows that have a matching ID between the two datasets (Please, see tables below).I tried to merge datasets with the r bind function but all rows were included. Do you have some advice to keep only rows with a matching ID?

Input

df1

ID VAR 1 VAR 2

a ... ...

b ... ...

c ... ...

d ... ...

df2

ID VAR 1 VAR 2

a ... ...

b ... ...

e ... ...

f ... ...

Output

ID VAR 1 VAR 2

a ... ...

b ... ...

Relevant answer

Inès François

Jan 17, 2024

Answer

I found a solution to this problem.

df2[df2$ID %in% df1$ID,]

df1 includes all identifiers that you want to select matching rows and df2 your dataset including all variables

View

0 Recommendations

Mosharop Hossian

asked a question related to Dataset

Mastering Firth Logistic Regression: Managing Troublesome Data with Confidence

Question

3 answers

Mar 19, 2023

Firth logistic regression is a special version of usual logistic regression which handles separation or quasi-separation issues. To understand the Firth logistic regression, we have to go one step back.

What is logistic regression?

Logistic regression is a statistical technique used to model the relationship between a categorical outcome/predicted variable, y(usually, binary - yes/no, 1/0) and one or more independent/predictor or x variables.

What is maximum likelihood estimation?

Maximum likelihood estimation is a statistical technique to find the best representative model that represents the relationship between the outcome and the independent/predictor variables of the underlying data (your dataset). The estimation process calculates the probability of different models to represent the dataset and then selects the model that maximizes this probability.

What is separation?

Separation means empty bucket for a side! Suppose, you are trying to predict meeting physical activity recommendations (outcome - 1/yes and 0/no) and you have three independent or predictor variables like gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no). Suppose, you have a combination, gender = male, socio-economic condition = rich, incentive for physical activity = no, which always predict not meeting physical activity recommendation (outcome - 0/no). This is an example of complete separation.

What is quasi-separation?

Reconsider the above example. We have 50 adolescents for the combination- gender = male, socio-economic condition = rich, incentive for physical activity = no. For 49/48 (not exactly 50, near about 50) of them, outcome is "not meeting physical activity recommendation" (outcome - 0/no). This is the instance of quasi-separation.

How separation or quasi-separation may impact your night sleep?

When separation or quasi-separation is present in your data, the traditional logistic regression will keep increasing the co-efficient of predictors/independent variables to infinite level (to be honest, not infinite, the wording should be without limit) to establish the bucket theory - one of the outcomes is completely or nearly empty. When the anomaly happens, it is actually suggesting that the traditional logistic regression model is outdated here.

There is a bookish name of the issue - convergence issue. But how to know convergence issues have occurred with the model?

- Very large co-efficient estimates. The estimates could be near infinite too!

- Along with large co-efficient estimates, you may see large standard errors too!

- It may also happen that logistic regression tried several times (known as iterations) but failed to get the best model or in bookish language, failed to converge.

What to do if such convergence issues have occurred?

Forget all the hard works you have done so far! You have to start your new journey with an alternative logistic regression, which is known as Firth logistic regression. But what Firth logistic regression actually does? Without using much technical terms, Firth logistic regression actually leads to more reliable co-efficients, which helps to choose best representative model for your data ultimately.

How to conduct Firth logistic regression?

First install the package "logistf" and load it in your R-environment.

install.packages("logistf")

library(logistf)

Now, assume you have a dataset "physical_activity" with a binary outcome variable "meeting physical activity recommendation" and three predictor/independent variables: gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no).

pa_model <- logistf(meet_PA ~ gender + sec + incentive, data = physical_activity)

Now, display the result.

summary(pa_model)

You got log odds. Now, we have to convert it into odds.

odds_ratios_pa <- exp(coef(pa_model))

print(odds_ratios_pa)

Game over! Now, how to explain the result?

Don't worry! There is nothing special. The explanation of Firth logistic regression's result is same as traditional logistic regression model. However, if you are struggling with the explanation, let me know in the comment. I will try my best to reduce your stress!

Note: If you find any serious methodological issue here, my inbox is open!

Relevant answer

Nathan Sheppard

Jan 14, 2024

Answer

Thank you for this post. I am curious, can you conduct a Hosmer-Lemeshow goodness-of-fit test on your logistf model in R?

View

0 Recommendations

Anna Reboa

asked a question related to Dataset

Error in R package GAD: Anova and SNK test ?

Question

1 answer

Jan 11, 2024

Hi everyone! I tried to perform a classic One Way Anova with the package GAD in R, followed by a SNK test, which I always used, but it didn't work with this dataset, and I got the same error for both tests, which is the following:

"Error in if (colnames(tm.class)[j] == "fixed") tm.final[i, j] = 0 :

missing value where TRUE/FALSE needed"

I understand there is something that gives NA values in my datatset but I do not know how to fix it. There are no NA values in the dataset as itself. Here is the dataset:

temp Filtr_eff

gradi19 11.33

gradi19 15.90

gradi19 10.54

gradi26 11.01

gradi26 -1.33

gradi26 9.80

gradi30 -49.77

gradi30 -42.05

gradi30 -32.03

So, I have three different levels of the factor temp (gradi19, gradi26 and gradi30) and my variable is Filtr_eff. I also already set the factor as fixed.

Please help me, how do I fix the error? I could do the Anova with another package (library car worked for example with this dataset) and I could do tukey instead of SNK, but I want to understand why I got this error since it never happened to me..thanks!

PS: I attached the R and txt files

1way anova
t t7.R
930 B
1way anova
t7.txt
151 B

Relevant answer

Anna Reboa

Jan 12, 2024

Answer

no one aswered but I found the solution so I write it here just in case someone will need it in the future!

with GAD package you have to change the name of the factor , it cannot be the same as the variable so I changed it as in the script I leave here and now it works!

1way anova
t t7.R
802 B

View

0 Recommendations

Nimal De Silva

asked a question related to Dataset

Short Course: Statistics, Calibration Strategies and Data Processing for Analytical Measurements?

Question

2 answers

Jan 9, 2024

Short Course: Statistics, Calibration Strategies and Data Processing for Analytical Measurements

Pittcon 2024, San Diego, CA, USA (Feb 24-28, 2024)

Time: Saturday, February 24, 2024, 8:30 AM to 5:00 PM (Full day course)

Short Course: SC-2561

Presenter: Dr. Nimal De Silva, Faculty Scientist, Geochemistry Laboratories, University of Ottawa, Ontario, Canada K1N 6N5

Email: [email protected]

Abstract:

Over the past few decades, instrumental analysis has come a long way in terms of sensitivity, efficiency, automation, and the use of sophisticated software for instrument control and data acquisition and processing. However, the full potential of such sophistication can only be realized with the user’s understanding of the fundamentals of method optimization, statistical concepts, calibration strategies and data processing, to tailor them to the specific analytical needs without blindly accepting what the instrument can provide. The objective of this course is to provide the necessary knowledge to strategically exploit the full potential of such capabilities and commonly available spreadsheet software. Topics to be covered include Analytical Statistics, Propagation of Errors, Signal Noise, Uncertainty and Dynamic Range, Linear and Non-linear Calibration, Weighted versus Un-Weighted Regression, Optimum Selection of Calibration Range and Standard Intervals, Gravimetric versus Volumetric Standards and their Preparation, Matrix effects, Signal Drift, Standard Addition, Internal Standards, Drift Correction, Matrix Matching, Selection from multiple responses, Use and Misuse of Dynamic Range, Evaluation and Visualization of Calibrations and Data from Large Data Sets of Multiple Analytes using EXCEL, etc. Although the demonstration data sets will be primarily selected from ICPES/MS and Chromatographic measurements, the concepts discussed will be applicable to any analytical technique, and scientific measurements in general.

Learning Objectives:

After this course, you will be familiar with:

- Statistical concepts, and errors relevant to analytical measurements and calibration.

- Pros and cons of different calibration strategies.

- Optimum selection of calibration type, standards, intervals, and accurate preparation of standards.

- Interferences, and various remedies.

- Efficient use of spreadsheets for post-processing of data, refining, evaluation, and validation.

Access to a personal laptop for the participants during the course would be helpful, although internet access during the course is not necessary. However, some sample- and work-out spreadsheets, and course material need to be distributed (emailed) to the participants day before the course.

Target Audience: Analytical Technicians, Chemists, Scientists, Laboratory Managers, Students

Register for Pittcon: https://pittcon.org/register

Relevant answer

Nimal De Silva

Jan 11, 2024

Answer

Dear Thiphol:

Many thanks for your interest. Currently, I don't have a recorded video. However, I may offer this course in the future on-line in a webinar format if there is sufficient interest/inquiries.

Thanks again.

Nimal

View

0 Recommendations

Sunita Dhakal

asked a question related to Dataset

When the data are provided in the median and IQR, How do we calculate proportion to perform pooled analysis?

Question

2 answers

Jan 10, 2024

The datasets are provided as medians and interquartile ranges. Can we perform a pooled analysis? How do we convert variables to get the events as (N)?

Relevant answer

Alene Seyoum

Jan 10, 2024

Answer

When you have data presented in the median and interquartile range (IQR) and you want to perform a pooled analysis to calculate proportions, you may face a challenge because the median and IQR summarize the central tendency and spread of the data. Still, they do not provide information about individual data points.

Proportions are typically calculated based on the counts of events or observations concerning the total number of events or observations. You need the actual data values rather than summary statistics like median and IQR to calculate proportions.

If your goal is to perform a pooled analysis and calculate proportions, you would ideally need access to the raw data or summary statistics that allow for the computation of proportions. If obtaining the individual data is impossible, you may need to explore alternative statistical methods or approaches that can accommodate summary statistics like medians and IQR.

If the median and IQR are the only summary statistics available, you might consider reaching out to the original data sources or authors of the study to request the raw data or additional information that would allow for more detailed analysis, including the calculation of proportions.

View

3 Recommendations

Wisam Mohammed Abed Alqaraghuli

asked a question related to Dataset

How do you select important variables while working on a data set?

Question

4 answers

Jan 8, 2024

...

Relevant answer

Sundus F Hantoosh

Jan 10, 2024

Answer

Dear Doctor

"Types of Feature Selection Methods in ML

Information Gain. Information gain calculates the reduction in entropy from the transformation of a dataset. ...
Chi-square Test. ...
Fisher's Score. ...
Mean Absolute Difference (MAD) ...
Forward Feature Selection. ...
Exhaustive Feature Selection. ...
Recursive Feature Elimination. ...
LASSO Regularization (L1)"

View

4 Recommendations

Sanjukta Ghosh

asked a question related to Dataset

Is it correct to merge different RNA-Seq datasets and perform differential expression analysis on Galaxy?

Question

7 answers

Jan 9, 2024

I have three RNA-Seq datasets of the same tissue and want to analyse them on Galaxy. My initial literature survey gave me the idea that I can merge the three datasets if they are from the same model and tissue followed by making two groups Control and Test and then run the analysis. Am I correct?

Can somebody with more experience elaborate on this?

Or it is a better idea to analyse the three datasets separately and find the common mRNAs?

Relevant answer

Jochen Wilhelm

Jan 9, 2024

Answer

There will likely be significant batch effects. I would analyze each set separately to get a higher power (which will be the case when the variability between the sets is large and won't be compensated by the reduction of standard errors due to the larger sample size).

You might consider pooling the p-values according to Fisher's method, if you need a single p-value per gene.

View

9 Recommendations

Muhammad Noor Ul Zaman

asked a question related to Dataset

Is it possible two integrate two or more ML or DL models trained on different datasets and accepting different inputs? If yes then how?

Question

1 answer

Jan 3, 2024

How to integrate two different ML or DL models in a single framework?

Relevant answer

S M Mohiuddin Khan Shiam

Jan 6, 2024

Answer

Yes, you can integrate multiple ML or DL models trained on different datasets and diverse inputs. Think of it as orchestrating experts with different knowledge to solve a complex problem. Here are common approaches:

Ensemble Learning: Combine multiple models' predictions to create a more robust and accurate one. Think of it as a panel of experts voting on the best answer.

Stacking: Train a meta-model to learn how to best combine the predictions of individual models. Like having a manager who knows how to weigh each expert's opinion.

Pipelines: Chain models together sequentially, where each model's output becomes the input for the next. Like an assembly line, where each expert adds their expertise.

Multimodal Models: Design models that handle multiple input types, like text and images, fusing information from different sources. Like a multi-lingual expert who can integrate knowledge from different languages.

View

0 Recommendations

Shahzad Ahmad

asked a question related to Dataset

What does the "1.5" signify in the IQR (Interquartile Range) Method for outlier detection?

Question

1 answer

Jan 5, 2024

Outliers detection criteria in a data set.

Relevant answer

Ramnath Takiar

Jan 6, 2024

Answer

Dear Shahzad, The iInter Quartile Range method uses a range of values, defined with the help of first and third quartiles, to identify the Outliers. The formula make use of two values termed as Low fence and High fence values, beyond which any value lying is considered as an Outlier.

Inter Quartile Range = IQR = (Q3 – Q1)

Where Q3 is the 3rd Quartile and Q1 is the 1st Quartile

Low Fence value = LF = Q1 – 1.5 * IQR

High Fence value = HF = Q3 + 1.5 * IQR

It is assumed here that the maximum value in any given set of data is within 1.5 times IQR distance from Q3. So, also, the minimum value is at the distance of 1.5 times of IQR from Q1.

For more details, please see the following two research papers:

Ramnath Takiar (2023):The Relationship between the SD and the Range and a method for the Identification of the Outliers, Bulletin of Mathematics and Statistics Research, Vol. 11(4), 62-75.

Ramnath Takiar (2023):A New Method to Identify the Outliers Based on the Interquartile Range, Bulletin of Mathematics and Statistics Research, Vol. 11(4), 103-114.

View

2 Recommendations

Md. Hasan Jafre Shovon

asked a question related to Dataset

Can anyone help with differential expression analysis of genes by DESeq2 package on Rstudio?

Question

1 answer

Jan 1, 2024

I'm not getting any solutions for DE analysis on R and also can't figure out which dataset should be used for this type of analysis. looking for some help !!

Relevant answer

Kavita Patel

Jan 6, 2024

Answer

you can perform DEseq2 analysis on quantified dataset which is generated after alignment, also you can download directly quantified data from GEO database. and DEseq2 analysis also can be performed on python.

View

0 Recommendations

Moses Owoicho Audu

asked a question related to Dataset

How do we extract data from the Climate Research Unit dataset?

Question

1 answer

Jan 5, 2024

I want to get data for climatic variables from the Climate Research Unit dataset for analysis

Relevant answer

Kaushik Shandilya

Jan 6, 2024

Answer

Hey there Moses Owoicho Audu! Extracting data from the Climate Research Unit dataset involves a few steps, and it depends on the specific variables you're interested in. Assuming you're comfortable with programming, you Moses Owoicho Audu can use languages like Python and tools like pandas to make the process smoother.

1. **Access the Dataset:**

First, make sure you Moses Owoicho Audu have access to the Climate Research Unit dataset. You Moses Owoicho Audu might need to download it from a reliable source or access it through an API if available.

2. **Data Format:**

Check the format of the dataset. It could be in CSV, Excel, or another format. Understanding the structure of the data is crucial for efficient extraction.

3. **Python and Pandas:**

Python is a popular language for data analysis. Use the pandas library to read and manipulate the data. Assuming your dataset is in CSV format, here's a simple example:

```python

import pandas as pd

# Replace 'your_dataset.csv' with the actual file name

dataset = pd.read_csv('your_dataset.csv')

# Now 'dataset' is a pandas DataFrame, and you Moses Owoicho Audu can start analyzing the data

```

4. **Filtering Variables:**

Identify the climatic variables you're interested in and filter the dataset accordingly. For example, if you're looking at temperature and precipitation, create new DataFrames for each.

```python

temperature_data = dataset[['Date', 'Temperature']]

precipitation_data = dataset[['Date', 'Precipitation']]

```

5. **Time Series Analysis:**

Since you're dealing with climatic data, consider performing time series analysis. Pandas provides excellent support for this.

6. **Visualization:**

Use visualization libraries like Matplotlib or Seaborn to create plots and gain insights into the data.

```python

import matplotlib.pyplot as plt

plt.plot(temperature_data['Date'], temperature_data['Temperature'])

plt.title('Temperature Over Time')

plt.xlabel('Date')

plt.ylabel('Temperature')

plt.show()

```

some interesting articles for you:

Chapter Air Pollution and Climate Change: Relationship Between Air Q...

Chapter Applications of Remote Sensing

Technical Report How to develop Green Culture with Sustainable actions for Cl...

Technical Report Green Culture: Sustainable Actions for Climate Change

Remember, the specifics depend on the structure of the dataset and your analysis goals. If you Moses Owoicho Audu have more detailed requirements, feel free to share them!

View

0 Recommendations

S M Mohiuddin Khan Shiam

asked a question related to Dataset

How does the time complexity of quicksort compare to mergesort in sorting large datasets?

Question

1 answer

Jan 3, 2024

Seeking insights for algorithmic optimization.

Relevant answer

Michael John Patrick

Jan 5, 2024

Answer

As a starting point, what did Google tell you?

View

0 Recommendations

Vuu Thu Trang

asked a question related to Dataset

GSEM with oridinary DV?

Question

4 answers

Jan 3, 2024

Hello everyone

I am working with my master thesis.

I have 3 latents IV and one ordered DV.

Can I use GSEM to deal with my data set in stata?

Thank you

Relevant answer

David L Morgan

Jan 4, 2024

Answer

If you already have latent IVs then why not treat your DV the same way?

View

3 Recommendations

Kennedy Bwire

asked a question related to Dataset

Could anyone assist me with getting a Dataset for Soil Carbon sequestration?

Question

1 answer

Dec 30, 2023

I am a Msc student and my thesis is framed on developing a CNN-based approach to predict soil carbon hotspots using remote sensing data. Soil carbon hotspots are areas where the concentration of organic carbon in the soil is unusually high. These hotspots are important because they play a critical role in the global carbon cycle, helping to regulate the Earth's climate. This research will focus on developing a CNN-based approach to predict soil carbon hotspots, which can be used to identify areas that are particularly important for soil carbon sequestration. I am writing passionately for assistance which will help me assess the dataset which has a combination of remote and satellite dataset to aid me use it in my thesis. Thank you for your time and consideration

Relevant answer

Paul Milham

Jan 2, 2024

Answer

I can't answer the question and would pose another. What is the quality of soil C signals in the source data?

View

0 Recommendations

Zarjes Kader

asked a question related to Dataset

I want to know about Frequency Ratio ( FR) model, Can any one suggest me??

Question

1 answer

Nov 23, 2023

Hellow!

Actually, I want to delineate the groundwater potential zone by using the FR model.

But I am confused about groundwater well data that is used for in different research purposes. Most of the paper divides the data into two sets (training and testing) for validation and FR calculation. But, for my study area, that falls into only 19 wells. So i am confused that, can i divided it training and testing datasets or 19 well used for both validation and FR calculation?

Please suggest me.

Thanks in advance.

Relevant answer

Mandana Bayat

Dec 27, 2023

Answer

The FR is a bivariate statistical approach and used to determine probability of groundwater potential areas on the basis of relationships between spring, wells and independent variables groundwater level influencing factors

View

4 Recommendations

Jay Xu

asked a question related to Dataset

How to use this type of data for scRNA-seq analysis?

Question

2 answers

Nov 30, 2023

Lately I'm working on the scRNA-seq analysis. It took some time for me to find a proper dataset on GEO, whose Accession ID is GSE157783. I expected to get three files of each sample from the dataset, but I ended up finding that the authors only provided 3 files in total.

Besides, I found the format of the 3 files different to that mentioned in the online courses. I suppose that files end up with "tsv.gz" are needed, but here I just found 3 "tar.gz" files.

Hope someone can help :(

Snipaste_2023-11-30_17-38-
29.png
134.78 KB

Relevant answer

Jay Xu

Dec 25, 2023

Answer

Kantemir Bzhikhatlov Thanks for your kind help! : )

View

0 Recommendations

Sara Hayoun

asked a question related to Dataset

What are the quality assurance measures taken before and after applying interpolation/extrapolation techniques?

Question

2 answers

Dec 23, 2023

I need your helpe PLEASE!

For my research paper, and In order to develop my dataset, i fielded the missing observation with interpolation/extrapolation method. And I need to ensure the quality and behavior of data before starting my analysis. Could you kindly provide more details on the specific steps and methodologies to be employed to ensure the meaningfulness and verifiability of the results, I am particularly interested in understanding:

- The quality assurance measures taken before and after applying interpolation/extrapolation techniques.

-Is there a trend approach to be adopted to reflect developments within the periods for the missing data?

- and if there are any diagnostic tests to be conducted to validate the reliability of the fielded data.

Thank you in advance for your time and consideration.

Relevant answer

Toheeb Lekan Jolaosho

Dec 25, 2023

Answer

BEFORE

1. Data verification:

2. Data preprocessing:

3. Model selection:

4. Validation and cross-validation:

5. Sensitivity analysis:

6. Error estimation:

After applying interpolation or extrapolation techniques, additional quality assurance measures can be taken:

1. Result validation

2. Sensitivity analysis

3. Result visualization:

By following these quality assurance measures, the accuracy and reliability of the interpolation or extrapolation results can be improved, ensuring that the derived values are as valid and useful as possible.

View

5 Recommendations

Kamel Ghanem Ghalem

asked a question related to Dataset

Segmentation of iris images (Dataset: Casia Iris V3 Lamp) using adaptative thresholding

Question

2 answers

Dec 18, 2023

The iris images of the Casia Iris V3 Lamp were acquired under variations of illumination. The local thresholding fails to segment some iris images. The alternative is to use: The adaptative thresholding technique: is this type of thresholding technique performant?

Relevant answer

Kamel Ghanem Ghalem

Dec 20, 2023

Answer

Thank's for you help

View

0 Recommendations

Saurav Gautam

asked a question related to Dataset

Software available for Ellipsometry simulation?

Question

2 answers

Dec 18, 2023

Hi,

i want to study doping effect characterization using ellipsometry.i have 5 dataset of n & k values of doped thin film. Is there any software available to simulate ellipsometry and get parameters like reflection, delta to analyse further. I try to find on ANSYS lumerical but couldn't find any good information about ellipsometry simulation.

Thanks.

Relevant answer

Kaushik Shandilya

Dec 18, 2023

Answer

Dear friend Saurav Gautam

Hey there! Now, when it comes to diving into the world of ellipsometry simulation, I have got your back. Simulation tools are crucial for understanding the intricate details of thin films and their optical properties. While I might not have real-time information, let me recommend a few software options that were popular for ellipsometry simulations:

1. **FilmWizard by J.A. Woollam**: This software is designed for spectroscopic ellipsometry data analysis and simulation. It's widely used in both academia and industry.

2. **CompleteEASE by J.A. Woollam**: Another tool from J.A. Woollam, CompleteEASE is a comprehensive ellipsometry data analysis and simulation software.

3. **WVASE32 by J.A. Woollam**: This is a powerful software tool for spectroscopic ellipsometry data analysis and simulation.

4. **DeltaPsi2 by HORIBA Jobin Yvon**: This software is part of the ellipsometer offerings by HORIBA and is known for its user-friendly interface.

5. **TFCalc by Software Spectra Inc.**: While primarily known for thin film design, TFCalc also supports ellipsometric analysis and simulation.

Remember, the availability of specific features might vary across these tools, so it's a good idea to explore the documentation or contact the software providers for more detailed information.

Now, go forth and unravel the mysteries of your doped thin films with the power of simulation! If you Saurav Gautam need further insights or have any other questions, just shout out. I am here to assist!

View

3 Recommendations

Llahm Omar Faraj Ben Dalla

asked a question related to Dataset

What is the differences between supervised and unsupervised machine learning dataset?

Question

7 answers

Dec 12, 2023

Supervised Learning

In supervised learning, the dataset is labeled, meaning each input has an associated output or target variable. For instance, if you're working on a classification problem to predict whether an email is spam or not, each email in the dataset would be labeled as either spam or not spam. Algorithms in supervised learning are trained using this labeled data. They learn the relationship between the input variables and the output by being guided or supervised by this known information. The ultimate goal is to develop a model that can accurately map inputs to outputs by learning from the labeled dataset. Common tasks include classification, regression, and ranking.

Unsupervised Learning

Unsupervised learning deals with unlabeled data, where the information does not have corresponding output labels. There's no specific target variable for the algorithm to predict. Algorithms in unsupervised learning aim to find patterns, structures, or relationships within the data without explicit guidance. For instance, clustering algorithms group similar data points together based on some similarity or distance measure. The primary goal is to explore and extract insights from the data, uncover hidden structures, detect anomalies, or reduce the dimensionality of the dataset without any predefined outcomes. Supervised learning uses labeled data with known outcomes to train models for prediction or classification tasks, while unsupervised learning works with unlabeled data to explore and discover inherent patterns or structures within the dataset without explicit guidance on the expected output. Both have distinct applications and are used in different scenarios based on the nature of the dataset and the desired outcomes.

Relevant answer

Murtaza Khan

Dec 17, 2023

Answer

In the realm of machine learning, the main distinction between supervised and unsupervised learning lies in the nature of the dataset used for training.

Supervised Learning Dataset:

In supervised learning, the dataset consists of labeled examples, where each data instance is associated with a corresponding target or output value. The dataset includes both input features and the desired output or target variable. The aim of supervised learning is to learn a mapping function that can accurately predict the target variable based on the input features. The model is trained using labeled examples, allowing it to generalize and make predictions on unseen data. Common examples of supervised learning algorithms include linear regression, decision trees, and support vector machines.

Unsupervised Learning Dataset:

On the other hand, unsupervised learning involves unlabeled datasets, meaning they do not have corresponding target values. In this scenario, the model learns patterns, structures, or relationships within the data based solely on the input features. The objective of unsupervised learning is to discover inherent patterns or groupings within the data without prior knowledge of the desired output. Common unsupervised learning algorithms include clustering algorithms such as k-means clustering and dimensionality reduction techniques like principal component analysis (PCA).

View

5 Recommendations

Isaac Kofi Nti

asked a question related to Dataset

Exploring synthetic datasets as a paradigm shift in machine learning for privacy and data-scarce domains

Question

2 answers

Dec 13, 2023

In the realm of machine learning, the availability of large and diverse datasets is often crucial for effective model training. However, in certain domains where data is limited or privacy concerns are paramount, exploring the use of synthetic datasets emerges as a compelling alternative.

Question: How can the adoption of synthetic datasets revolutionize machine learning applications in areas with data scarcity and stringent privacy considerations?

Relevant answer

Muhammad Ali

Dec 16, 2023

Answer

Synthetic data is suitable for model validation and testing the performance and accuracy of the model, method or algorithms., however, the real data is extremely necessary for the actual performance of the method or algorithms.

View

4 Recommendations

Yang Gu

asked a question related to Dataset

How can I calculate the ICC1, ICC2 for each company?

Question

5 answers

Dec 14, 2023

I have a dataset that includes 1900 companies. Also, I investigated 10 employees for each company. There is a question about the risk preference of each employee. At now, I need to calculate the ICC1 and ICC2 values for each company. I have already coded for each company, so each company will have a unique company_id. At now, I have the employee dataset, it means I have the 19000 data, and each employee will match the company according to the company_id. In this case, how to get the ICC1, and ICC2 value of each company in R. I have already tried for few days, expecting someone could resolve my problem.

Relevant answer

Rainer Duesing

Dec 15, 2023

Answer

P.S.: Plaul Bliese has a multilevel tutorial for R, where he shows how to calcualte the above mentioned indices, as well as others, since all have their specific problems, which would lead too far to discuss them here.

https://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf

View

5 Recommendations

… 34