Science topics: LepidopteraData
Science topic

Data - Science topic

Explore the latest questions and answers in Data, and find Data experts.
Questions related to Data
  • asked a question related to Data
Question
4 answers
I am trying to calculate an ECM with panel data and thereby I have the following problem. I run the ecm command from the ecm package an error occurs saying “non-numeric matrix extent “. I found out that the reason is the creation of the panel data. I tried two different approaches to fix the problem.
At first I created the panel dataset with the pdata.frame command from the plm package.
p.dfA <- pdata.frame(data, index = c("id", "t"))
where “index” indicates the individual and time indexes for the panel dataset. The command converts the id and t variables into factor variables which later leads to the error in the ecm command.
Secondly, I created the panel dataset with the panel_data command from the panelr package.
p.dfB <- panel_data(data, id = "id", wave = "t")
where “id” is the name of the column (unquoted) that identifies participants/entities. A new column will be created called id, overwriting any column that already has that name. “wave” is the name of the column (unquoted) that identifies waves or periods. A new column will be created called wave, overwriting any column that already has that name.
This panel_data command also converts the id variable into a factor variable. So, the same error occurs in the ecm command.
If I transfer the factor variables back into numeric variables, I lose the panel structure of the dataset.
Could someone please explain to me how to run an ECM with panel data in R?
Thank you very much in advance!
Sample R Script attached
Relevant answer
Answer
To calculate an Error Correction Model (ECM) with panel data in R, you can use packages like plm or panel. These packages provide functions to estimate fixed effects or random effects models, which can incorporate an error correction term for panel data analysis.
  • asked a question related to Data
Question
2 answers
Hey everyone,
I'm writing my master thesis on the impact of artificial intelligence on business productivity.
This study is mainly aimed at those of you who develop AI or use these technologies in your professional environment.
This questionnaire will take no more than 5 minutes to complete, and your participation is confidential!
Thank you in advance for your time and contribution!
To take part, please click on the link below: https://forms.gle/fzzHq4iNqGUiidTWA
Relevant answer
Answer
AI tools continue to have a positive impact on productivityOf those surveyed, 64% of managers said AI's output and productivity is equal to the level of experienced and expert managers, and potentially better than any outputs delivered by human managers altogether.
Regards,
Shafagat
  • asked a question related to Data
Question
1 answer
How to build a sustainable data center based on Big Data Analytics, AI, BI and other Industry 4.0/5.0 technologies and powered by renewable and carbon-free energy sources?
If a Big Data Analytics data center is equipped with advanced generative artificial intelligence technology and is powered by renewable and carbon-free energy sources, can it be referred to as sustainable, pro-climate, pro-environment, green, etc.?
Advanced analytical systems, including complex forecasting models that enable multi-criteria, highly sophisticated, big data and information processing-based forecasts of the development of multi-faceted climatic, natural, social, economic and other processes are increasingly based on new Industry 4.0/5.0 technologies, including Big Data Analytics and machine learning, deep learning and generative artificial intelligence. The use of generative artificial intelligence technologies enables the application of complex data processing algorithms according to precisely defined assumptions and human-defined factors. The use of computerized, integrated business intelligence information systems allows real-time analysis on the basis of continuously updated data provided and the generation of reports, reports, expert opinions in accordance with the defined formulas for such studies. The use of digital twin technology allows computers to build simulations of complex, multi-faceted, prognosticated processes in accordance with defined scenarios of the potential possibility of these processes occurring in the future. In this regard, it is also important to determine the probability of occurrence in the future of several different defined and characterized scenarios of developments, specific processes, phenomena, etc. In this regard, Business Intelligence analytics should also make it possible to precisely determine the level of probability of the occurrence of a certain phenomenon, the operation of a process, the appearance of described effects, including those classified as opportunities and threats to the future development of the situation. Besides, Business Intelligence analytics should enable precise quantitative estimation of the scale of influence of positive and negative effects of the operation of certain processes, as well as factors acting on these processes and determinants conditioning the realization of certain scenarios of situation development. Cloud computing makes it possible, on the one hand, to update the database with new data and information from various institutions, think tanks, research institutes, companies and enterprises operating within a selected sector or industry of the economy, and, on the other hand, to enable simultaneous use of a database updated in this way by many beneficiaries, many business entities and/or, for example, also by many Internet users in a situation where the said database would be made available on the Internet. In a situation where Internet of Things technology is applied, it would be possible to access the said database from the level of various types of devices equipped with Internet access. The application of Blockchain technology makes it possible to increase the scale of cybersecurity of the transfer of data sent to the database and Big Data information as part of the updating of the collected data and as part of the use of the analytical system thus built by external entities. The use of machine learning and/or deep learning technologies in conjunction with artificial neural networks makes it possible to train an AI-based system to perform multi-criteria analysis, build multi-criteria simulation models, etc. in the way a human would. In order for such complex analytical systems that process large amounts of data and information to work efficiently it is a good solution to use state-of-the-art super quantum computers characterized by high computing power to process huge amounts of data in a short time. A center for multi-criteria analysis of large data sets built in this way can occupy quite a large floor space equipped with many servers. Due to the necessary cooling and ventilation system and security considerations, this kind of server room can be built underground. while due to the large amounts of electricity absorbed by this kind of big data analytics center, it is a good solution to build a power plant nearby to supply power to the said data center. If this kind of data analytics center is to be described as sustainable, in line with the trends of sustainable development and green transformation of the economy, so the power plant powering the data analytics center should generate electricity from renewable energy sources, e.g. from photovoltaic panels, windmills and/or other renewable and emission-free energy sources of such a situation, i.e., when a data analytics center that processes multi-criteria Big Data and Big Data Analytics information is powered by renewable and emission-free energy sources then it can be described as sustainable, pro-climate, pro-environment, green, etc. Besides, when the Big Data Analytics analytics center is equipped with advanced generative artificial intelligence technology and is powered by renewable and emission-free energy sources then the AI technology used can also be described as sustainable, pro-climate, pro-environment, green, etc. On the other hand, the Big Data Analytics center can be used to conduct multi-criteria analysis and build multi-faceted simulations of complex climatic, natural, economic, social processes, etc. with the aim of, for example. to develop scenarios of future development of processes observed up to now, to create simulations of continuation in the future of diagnosed historical trends, to develop different variants of scenarios of situation development according to the occurrence of certain determinants, to determine the probability of occurrence of said determinants, to estimate the scale of influence of external factors, the scale of potential materialization of certain categories of risk, the possibility of the occurrence of certain opportunities and threats, estimation of the level of probability of materialization of the various variants of scenarios, in which the potential continuation of the diagnosed trends was characterized for the processes under study, including the processes of sustainable development, green transformation of the economy, implementation of sustainable development goals, etc. Accordingly, the data analytical center built in this way can, on the one hand, be described as sustainable, since it is powered by renewable and emission-free energy sources. In addition to this, the data analytical center can also be helpful in building simulations of complex multi-criteria processes, including the continuation of certain trends of determinants influencing the said processes and the factors co-creating them, which concern the potential development of sustainable processes, e.g. economic, i.e. concerning sustainable economic development. Therefore, the data analytical center built in this way can be helpful, for example, in developing a complex, multifactor simulation of the progressive global warming process in subsequent years, the occurrence in the future of the negative effects of the deepening scale of climate change, the negative impact of these processes on the economy, but also to forecast and develop simulations of the future process of carrying out a pro-environmental and pro-climate transformation of the classic growth, brown, linear economy of excess to a sustainable, green, zero-carbon zero-growth and closed-loop economy. So, the sustainable data analytical center built in this way will be able to be defined as sustainable due to the supply of renewable and zero-carbon energy sources, but will also be helpful in developing simulations of future processes of green transformation of the economy carried out according to certain assumptions, defined determinants, estimated probability of occurrence of certain impact factors and conditions, etc. orz estimating costs, gains and losses, opportunities and threats, identifying risk factors, particular categories of risks and estimating the feasibility of the defined scenarios of the green transformation of the economy planned to be implemented. In this way, a sustainable data analytical center can also be of great help in the smooth and rapid implementation of the green transformation of the economy.
Kluczowe kwestie dotyczące problematyki zielonej transformacji gospodarki opisałem w poniższym artykule:
IMPLEMENTATION OF THE PRINCIPLES OF SUSTAINABLE ECONOMY DEVELOPMENT AS A KEY ELEMENT OF THE PRO-ECOLOGICAL TRANSFORMATION OF THE ECONOMY TOWARDS GREEN ECONOMY AND CIRCULAR ECONOMY
Zastosowania technologii Big Data w analizie sentymentu, analityce biznesowej i zarządzaniu ryzykiem opisałem w artykule mego współautorstwa:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
I have described the key issues of opportunities and threats to the development of artificial intelligence technology in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
If a Big Data Analytics data center is equipped with advanced generative artificial intelligence technology and is powered by renewable and carbon-free energy sources, can it be described as sustainable, pro-climate, pro-environment, green, etc.?
How to build a sustainable data center based on Big Data Analytics, AI, BI and other Industry 4.0/5.0 technologies and powered by renewable and carbon-free energy sources?
How to build a sustainable data center based on Big Data Analytics, AI, BI and other Industry 4.0/5.0 and RES technologies?
What do you think about this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text, I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
In my opinion, Building a sustainable data center will needing environment, sustainability and goverence kind of model. Virtualization and Consolidation, Green Building Design takes the first part. Cooling systems and training takes the second place.
  • asked a question related to Data
Question
10 answers
In the domain of clinical research, where the stakes are as high as the complexities of the data, a new statistical aid emerges: bayer: https://github.com/cccnrc/bayer
This R package is not just an advancement in analytics - it’s a revolution in how researchers can approach data, infer significance, and derive conclusions
What Makes `Bayer` Stand Out?
At its heart, bayer is about making Bayesian analysis robust yet accessible. Born from the powerful synergy with the wonderful brms::brm() function, it simplifies the complex, making the potent Bayesian methods a tool for every researcher’s arsenal.
Streamlined Workflow
bayer offers a seamless experience, from model specification to result interpretation, ensuring that researchers can focus on the science, not the syntax.
Rich Visual Insights
Understanding the impact of variables is no longer a trudge through tables. bayer brings you rich visualizations, like the one above, providing a clear and intuitive understanding of posterior distributions and trace plots.
Big Insights
Clinical trials, especially in rare diseases, often grapple with small sample sizes. `Bayer` rises to the challenge, effectively leveraging prior knowledge to bring out the significance that other methods miss.
Prior Knowledge as a Pillar
Every study builds on the shoulders of giants. `Bayer` respects this, allowing the integration of existing expertise and findings to refine models and enhance the precision of predictions.
From Zero to Bayesian Hero
The bayer package ensures that installation and application are as straightforward as possible. With just a few lines of R code, you’re on your way from data to decision:
# Installation devtools::install_github(“cccnrc/bayer”)# Example Usage: Bayesian Logistic Regression library(bayer) model_logistic <- bayer_logistic( data = mtcars, outcome = ‘am’, covariates = c( ‘mpg’, ‘cyl’, ‘vs’, ‘carb’ ) )
You then have plenty of functions to further analyze you model, take a look at bayer
Analytics with An Edge
bayer isn’t just a tool; it’s your research partner. It opens the door to advanced analyses like IPTW, ensuring that the effects you measure are the effects that matter. With bayer, your insights are no longer just a hypothesis — they’re a narrative grounded in data and powered by Bayesian precision.
Join the Brigade
bayer is open-source and community-driven. Whether you’re contributing code, documentation, or discussions, your insights are invaluable. Together, we can push the boundaries of what’s possible in clinical research.
Try bayer Now
Embark on your journey to clearer, more accurate Bayesian analysis. Install `bayer`, explore its capabilities, and join a growing community dedicated to the advancement of clinical research.
bayer is more than a package — it’s a promise that every researcher can harness the full potential of their data.
Explore bayer today and transform your data into decisions that drive the future of clinical research: bayer - https://github.com/cccnrc/bayer
Relevant answer
Answer
Many thanks for your efforts!!! I will try it out as soon as possible and will provide feedback on github!
All the best,
Rainer
  • asked a question related to Data
Question
2 answers
To what extent has the scale of disinformation generated with the use of applications available on the Internet based on generative artificial intelligence technology increased?
To what extent has the scale of disinformation generated in online social media increased using applications based on generative artificial intelligence technology available on the Internet?
Many research institutions have included among the main types of threats and risks developing globally in 2023 the question of the increase in the scale of organized disinformation operating in online social media. The diagnosed increase in the scale of disinformation generated in online social media is related to the use of applications available on the Internet based on generative artificial intelligence technology. With the help of applications available on the Internet, it is possible without being a computer graphic designer and even without artistic skills to simply and easily create graphics, drawings, photos, images, videos, animations, etc., which can represent graphically professionally created “works” that can depict fictional events. Then, with the help of other applications equipped with generative artificial intelligence technology and advanced language models, i.e. with the help of intelligent chatbots, text can be created to describe specific “fictional events” depicted in the generated images. Accordingly, since the end of 2022, i.e. since the first such intelligent chatbot, i.e. the first versions of ChatGPT, were made available on the Internet, the number of memes, photos, comments, videos, posts, banners, etc. generated with the help of applications equipped with tools based on artificial intelligence technology has been growing rapidly, including the rapid increase in the scale of disinformation generated in this way. In order to limit the scale of the aforementioned disinformation developing in online media, on the one hand, technology companies running social media portals and other online information services are perfecting tools for identifying posts, entries, comments, banners, photos, videos, animations, etc. that contain specific, usually thematic types of disinformation. However, these solutions are not perfect, and the scales of disinformation operating in internecine social media are still high. On the other hand, specific institutions for combating disinformation are being established, NGOs and schools are conducting educational campaigns to make citizens aware of the high scale of disinformation developing on the Internet. In addition, proposed regulations such as the AIAct, which as a set of regulations on the proper use of tools equipped with artificial intelligence technology is expected to come into force in the next 2 years in the European Union may play an important role in reducing the scale of disinformation developing on the Internet.
I have described the key issues of opportunities and threats to the development of artificial intelligence technology in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
To what extent has the scale of disinformation generated in online social media using applications based on generative artificial intelligence technology available on the Internet increased?
To what extent has the scale of disinformation generated using applications based on generative artificial intelligence technology available on the Internet increased?
What do you think about this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text, I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
Su capacidad de generar desinformación es por lo genérico de sus datos de entrenamiento, Watson especializado en temas médicos es fiable de a un punto sorprendente. ahora ahora, que yo sepa no ha fulminado a ningún enfermo, por el contrario las tasas de recuperación no se han disparado.
  • asked a question related to Data
Question
6 answers
Where can I get global data (freely accessible) for small and micro-enterprises across countries and time?
Relevant answer
Answer
You can access global data on small and micro-enterprises across countries and time from sources like the World Bank's Enterprise Surveys, the International Labour Organization (ILO), and the Global Entrepreneurship Monitor (GEM).
  • asked a question related to Data
Question
1 answer
In the rapidly evolving world of SaaS, where convenience often trumps concerns, data privacy remains a pivotal issue. As we increasingly rely on these platforms for both personal and professional tasks, the lines around data ownership and privacy seem to blur. SaaS providers / vendors, with their vast capabilities for data processing and storage, find themselves at the center of this ongoing debate. How do we navigate the fine line between leveraging the undeniable benefits of SaaS and protecting our personal data? In your experience, what measures should both users and providers take to ensure data privacy and security? Do you believe the current legal and ethical frameworks are sufficient to protect user data in the SaaS model?
Relevant answer
Answer
Any thoughts? Feel free to comment
This is our case; take your time to read :)
  • asked a question related to Data
Question
2 answers
Relevant answer
Answer
Eh, I'd rather be mysteriously confusing than rigorously understandable any day. Keeps people on their toes, you know? :P
  • asked a question related to Data
Question
1 answer
How to curb the growing scale of disinformation, including social media-generated factoids, deepfakey through the use of generative artificial intelligence technology?
In order to reduce the growing scale of disinformation, including disinformation generated in social media through in the increasing scale of emerging fakenews, deepfakes, disinformation generated through the use of applications available on the Internet based on generative artificial intelligence technology, the just mentioned GAI technology can be used. Constantly improved, taught to carry out new types of activities, tasks and commands, intelligent chatbots and other applications based on generative artificial intelligence technology can be applied to identify instances of disinformation spread primarily in online social media. The aforementioned disinformation is particularly dangerous for children and adolescents, it can significantly affect the world view of the general public's awareness of certain issues, it can affect the formation of development trends of certain social processes, it can affect the results of parliamentary and presidential elections, it can also affect the level of sales of certain types of products and services, and so on. In the absence of a developed institutional system of media control institutions, including the new online media; lack of a developed system of control of the level of objectivity of content directed to citizens in advertising campaigns; lack of consideration of the issue of disinformation analysis by competition and consumer protection institutions; lack of or poorly functioning democracy protection institutions; lack of institutions that reliably take care of a high level of journalistic ethics and media independence, the scale of disinformation of citizens by various groups of influence, including public institutions and commercially operating business entities may be high and may generate high social costs. Accordingly, new technologies of Industry 4.0/5.0, including generative artificial intelligence (GAI) technologies, should be involved in order to reduce the scale of growing disinformation, including the generation of factoids, deepfakes, etc. in social media. The aforementioned GAI technologies can help identify fakenews pseudo-journalistic content, identify photos containing deepfakes, identify factually incorrect content contained in banners, spots and advertising videos published in various media as part of ongoing advertising and promotional campaigns aimed at activating sales of various products and services.
I described the applications of Big Data technologies in sentiment analysis, business analytics and risk management in an article of my co-authorship:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
I described the key issues of opportunities and threats to the development of artificial intelligence technology in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How to curb the growing scale of disinformation, including social media generated factoids, deepfakey through the use of generative artificial intelligence technology?
How to curb disinformation generated in social media using artificial intelligence?
What do you think about this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
Dear Prof. Prokopowicz!
You spotted a real problem to fight with. I found a case study "Elections in 2024" that illustrates blind spots...:
a) CHARLOTTE HU (2024). How AI Bots Could Sabotage 2024 Elections around the World: AI-generated disinformation will target voters on a near-daily basis in more than 50 countries, according to a new analysis, Scientific American 24 February 2024, Quoting: "Currently, AI-generated images or videos are easier to detect than text; with images and videos, Du explains, “you have to get every pixel perfect, so most of these tools are actually very inaccurate in terms of lighting or other effects on images.” Text, however, is the ultimate challenge. “We don’t have tools with any meaningful success rate that can identify LLM-generated texts,” Sanderson says." Available at:
b) Heidi Ledford (2024). Deepfakes, trolls and cybertroopers: how social media could sway elections in 2024: Faced with data restrictions and harassment, researchers are mapping out fresh approaches to studying social media’s political reach. News, Nature 626, 463-464 (2024) Quoting: "Creative workarounds: ...behind the scenes, researchers are exploring different ways of working, says Starbird, such as developing methods to analyse videos shared online and to work around difficulties in accessing data. “We have to learn how to get insights from more limited sets of data,” she says... Some researchers are using qualitative methods such as conducting targeted interviews to study the effects of social media on political behaviour, says Kreiss. Others are asking social media users to voluntarily donate their data, sometimes using browser extensions. Tucker has conducted experiments in which he pays volunteers a small fee to agree to stop using a particular social media platform for a period, then uses surveys to determine how that affected their exposure to misinformation and the ability to tell truth from fiction."
Yours sincerely, Bulcsu Szekely
  • asked a question related to Data
Question
4 answers
Relevant answer
Answer
This statement is underdefined for me for a number of reasons: (i) what types and complexity of work is considered, (ii) what does rigor mean in this specific context, (iii) how to interpret the adverb 'eventually' here, and (iv) who is supposed to reach understanding, what level of it, and based on what body of knowledge.
KInd regards,
I.H.
  • asked a question related to Data
Question
3 answers
How should ChatGPT and other similar intelligent chatbots be improved so that they do not generate plagiarism of other publications that their authors have previously posted online?
This issue is particularly important, because it happens that the data entered into ChatGPT, the information contained in the texts entered for the purpose of automated rewriting, remains in the database that this chatbot uses in the situation of generating answers to questions asked by subsequent Internet users. The problem has become serious, as there have already been situations where sensitive data on specific individuals, institutions and business entities has been leaked in this way. On the other hand, many institutions and companies use ChatGPT in the preparation of reports, editing of certain documents. Also, pupils and students use ChatGPT and other similar intelligent chatbots to generate texts that act as credit papers and/or from which they then compose their theses. On the other hand, functions have been added to some existing anti-plagiarism applications to detect the fact that ChatGPT is being used in the course of students' writing credit papers and theses. In addition to this, the problem is also normative in nature, as it is necessary to adapt the legal norms of copyright law to the dynamic technological advances taking place in the development and application of generative artificial intelligence technology, so that the provisions of this law are not violated by users using ChatGPT or other similar intelligent chatbots. Among the important issues that could significantly reduce the scale of this problem would be the introduction of a mandatory requirement to mark all works, including texts, graphics, photos, videos, etc., that have been created with the help of the said intelligent chatbots, that they have been so created. On the other hand, it is necessary for the AI-equipped chatbots to be improved by their creators, by the technology companies developing these tools, in order to eliminate the possibility of ChatGPT "publishing" confidential, sensitive information from institutions and companies in response to questions, commands, tasks of developing a certain type of text by subsequent Internet users. In addition, the said intelligent chatbots should be improved in such a way that if in the course of automated text generation, including inspiration from other source texts, "quoting" whole sentences, substantial fragments of them, substantive content of other publications but without fully showing the sources, i.e. without a full bibliographic description of all the source publications that the chatbot generating subsequent texts used. In addition, the user of the aforementioned intelligent chatbots does not know to what extent the text they created with the help of these tools is plagiarized from other texts previously entered into them or from publications published on the Internet, including documents of companies and institutions, theses, scientific publications, industry articles, journalistic articles, etc.
I described the key issues of opportunities and threats to the development of artificial intelligence technology in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How should ChatGPT and other similar intelligent chatbots be improved so that they do not generate plagiarism of other publications that their authors have previously posted on the Internet?
How should ChatGPT be improved so that it does not generate plagiarism of other publications that their authors have previously posted on the Internet?
And what is your opinion about it?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
I recommend AnswerThis, an AI research tool to facilitate the writing. https://answerthis.io/signup.
  • asked a question related to Data
Question
3 answers
How to reduce the risk of leakage of sensitive data of companies, enterprises and institutions that previously employees of these entities enter into ChatGPT?
How to reduce the risk of leakage of sensitive data of companies, enterprises and institutions, which previously employees of these entities enter into ChatGPT or other intelligent chatbots equipped with generative artificial intelligence technology in an attempt to facilitate their work?
Despite the training and updating of internal rules and regulations in many companies and enterprises regarding the proper use of intelligent chatbots, i.e., for example, the ChatGPT made available online by OpenAI and other similar intelligent applications that more technology companies are making available on the Internet, there are still situations where reckless employees enter sensitive data of the companies and enterprises where they are employed into these online tools. In such a situation, there is a high risk that the data and information entered into ChatGPT, Copilot or any other such chatbot may subsequently appear in a reply, an edited report, essay, article, etc. by this application on the smartphone, laptop, computer, etc. of another user of the said chatbot. In this way, another Internet user may accidentally or through a deliberate action of searching for specific data come into possession of particularly important, key, sensitive data for a business entity, public institution or financial institution, which may concern, for example, confidential strategic plans, i.e., information of great value to competitors or intelligence organizations of other countries. This kind of situation has already happened and occurred in some companies characterized by highly recognizable brands in specific markets for the sale of products or services. Such situations clearly indicate that it is necessary to improve internal procedures for data and information protection, improve issues of efficiency of data protection systems, early warning systems informing about the growing risk of loss of key company data, and improve systems for managing the risk of potential leakage of sensitive data and possible cybercriminal attack on internal company information systems. In addition, in parallel to improving the aforementioned systems that ensure a certain level of data and information security, internal regulations should be updated on an ongoing basis according to the scale of the risk, the development of new technologies and their implementation in the business entity, with regard to the issue of correct use by employees of chatbots available on the Internet. In parallel, training should be conducted, during which employees learn about both new opportunities and risks arising from the use of new applications and tools based on generative artificial intelligence technology made available on the Internet. Another solution to this problem may be to order the company to completely ban employees from using smart chatbots made available on the Internet. In such a situation, the company will be forced to create its own, operating as internal such applications and intelligent chatbots, which are not connected to the Internet and operate solely as integral modules of the company's internal information systems. This type of solution will probably involve the company incurring significant financial expenses as a result of creating its own such IT solutions. The costs can be significant and many small companies' financial barrier can be high. However, on the other hand, if the construction of internal IT systems equipped with their own intelligent chatbot solutions becomes an important element of competitive advantage over key direct competitors, the mentioned financial expenses will probably be considered in the category of financial resources allocated to investment and development projects that are important for the future of the company.
The key issues of opportunities and threats to the development of artificial intelligence technology are described in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How to reduce the risk of leakage of sensitive data of companies, enterprises and institutions, which employees of these entities previously input into ChatGPT or other intelligent chatbots equipped with generative artificial intelligence technology in an attempt to facilitate their work?
How do you mitigate the risk of leakage of sensitive data of companies, enterprises and institutions that previously employees of these entities enter into ChatGPT?
What do you think about this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
What serious things can an LLM like ChatGPT be used for in work? From what I see, it is most often used by people who lack natural intelligence and use this tool to create shallow content. Especially in marketing and advertising. I'm probably not seeing everything, but the responses generated by ChatGPT themselves are so buggy and unreliable that I can't imagine anyone responsible using them for any serious purpose. Yes - the best solution seems to me to avoid using such tools in serious applications where the tool could have access to sensitive data, and to use it only as a successor to the "answering machine", or as a replacement for copywriters.
  • asked a question related to Data
Question
5 answers
We sometimes face difficulties in finding accessible data sources. Here are some data sources that are available freely for research:
1. Demographic and Health Survey (DHS) data (https://dhsprogram.com/data/available-datasets.cfm)
2. Global Tobacco Surveillance System Data (GTSSData) (https://nccd.cdc.gov/GTSSDataSurveyResources/Ancillary/DataReports.aspx?CAID=2)
3. Multiple Indicator Cluster Surveys (MICS) data by Unicef (https://mics.unicef.org/surveys)
Relevant answer
Answer
StatsCan is always a scholarly source. Although it is the Canadian federal government's statistics agency, researchers from anywhere in the world (in English or French) will find the website easy to use, and a good model for public access to research methods and results.
  • asked a question related to Data
Question
3 answers
Can artificial intelligence help improve sentiment analysis of changes in Internet user awareness conducted using Big Data Analytics as relevant additional market research conducted on large amounts of data and information extracted from the pages of many online social media users?
In recent years, more and more companies and enterprises, before launching new product and service offerings as part of their market research, commission sentiment analysis of changes in public sentiment, changes in awareness of the company's brand, recognition of the company's mission and awareness of its offerings to specialized marketing research firms. This kind of sentiment analysis is carried out on computerized Big Data Analytics platforms, where a multi-criteria analytical process is carried out on a large set of data and information taken from multiple websites. In terms of source websites from which data is taken, information is dominated by news portals that publish news and journalistic articles on a specific issue, including the company, enterprise or institution commissioning this type of study. In addition to this, the key sources of online data include the pages of online forums and social media, where Internet users conduct discussions on various topics, including product and service offers of various companies, enterprises, financial or public institutions. In connection with the growing scale of e-commerce, including the sale of various types of products and services on the websites of online stores, online shopping portals, etc., as well as the growing importance of online advertising campaigns and promotional actions carried out on the Internet, the importance of the aforementioned analyses of Internet users' sentiment on specific topics is also growing, as playing a complementary role to other, more traditionally conducted market research. A key problem for this type of sentiment analysis is becoming the rapidly growing volume of data and information contained in posts, comments, posts, banners and advertising spots posted on social media, as well as the constantly emerging new social media. This problem is partly solved by the issue of increasing computing power and multi-criteria processing of large amounts of data thanks to the use of increasingly improved microprocessors and Big Data Analytics platforms. In addition, in recent times, the possibilities of advanced multi-criteria processing of large sets of data and information in increasingly shorter timeframes may significantly increase when generative artificial intelligence technology is involved in the aforementioned data processing.
The key issues of opportunities and threats to the development of artificial intelligence technology are described in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
I described the applications of Big Data technologies in sentiment analysis, business analytics and risk management in my co-authored article:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
The use of Big Data Analytics platforms of ICT information technologies in sentiment analysis for selected issues related to Industry 4.0
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Can artificial intelligence help improve sentiment analysis of changes in Internet users' awareness conducted using Big Data Analytics as relevant additional market research conducted on a large amount of data and information extracted from the pages of many online social media users?
Can artificial intelligence help improve sentiment analysis conducted on large data sets and information on Big Data Analytics platforms?
What do you think about this topic?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
In my opinion, yes, artificial intelligence (AI) can indeed play a crucial role in improving sentiment analysis for changes in internet user awareness, especially when combined with big data analytics. Here's how:
  1. Natural Language Processing (NLP): AI techniques can be used to process and understand the natural language used in social media posts, comments, reviews, etc. This involves tasks such as text tokenization, part-of-speech tagging, named entity recognition, and more.
  2. Sentiment Analysis: AI algorithms can be trained to recognize and analyze the sentiment expressed in text data. This can help identify whether users are expressing positive, negative, or neutral opinions about specific topics, products, events, etc.
  3. Machine Learning Models: AI-powered machine learning models can be trained on large datasets of labelled social media data to predict sentiment accurately. These models can continuously learn and improve over time as they are exposed to more data.
  4. Deep Learning: Deep learning techniques, such as recurrent neural networks (RNNs) and transformers, can capture complex patterns in text data and improve sentiment analysis accuracy.
Thank You
  • asked a question related to Data
Question
1 answer
How can the application of generative artificial intelligence improve the existing applications of Big Data Analytics and increase the scale of application of these technologies in carrying out analyses of processing large data sets, generating multi-criteria simulation models and carrying out predictive analyses and projections?
The acceleration of the processes of digitization of the economy triggered by the development of the Covid-19 pandemic has resulted in a significant increase in computerization, Internetization, applications of ICT information technologies and Industry 4.0 to various economic processes. There is an increase in applications of specific Industry 4.0 technologies in many industries and sectors of the economy, i.e., such as Big Data Analytics, Data Science, cloud computing, machine learning, personal and industrial Internet of Things, artificial intelligence, Business Intelligence, autonomous robots, horizontal and vertical data system integration, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, cybersecurity instruments, Virtual and Augmented Reality, and other advanced Data Mining technologies. In my opinion, among others, in the fields of medical therapies, communications, logistics, new online media, life science, ecology, economics, finance, etc., and also in the field of predictive analytics, there is an increase in the applications of ICT information technologies and Industry 4.0/Industry 5.0. Artificial intelligence technologies are growing rapidly as they find applications in various industries and sectors of the economy. It is only up to human beings how and in what capacity artificial intelligence technology will be implemented in various manufacturing processes, analytical processes, etc., where large data sets are processed in the most efficient manner. In addition, various opportunities are opening up for the application of artificial intelligence in conjunction with other technologies of the current fourth industrial revolution referred to as Industry 4.0/5.0. It is expected that in the years to come, applications of artificial intelligence will continue to grow in various areas, fields of manufacturing processes, advanced data processing, in improving manufacturing processes, in supporting the management of various processes, and so on.
I have been studying this issue for years and have presented the results of my research in the article, among others:
APPLICATION OF DATA BASE SYSTEMS BIG DATA AND BUSINESS INTELLIGENCE SOFTWARE IN INTEGRATED RISK MANAGEMENT IN ORGANIZATION
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can the application of generative artificial intelligence improve the existing applications of Big Data Analytics and increase the scale of application of these technologies in carrying out analysis of processing large data sets, generating multi-criteria simulation models and carrying out predictive analysis and projections?
How can the application of generative artificial intelligence improve existing applications of Big Data Analytics?
And what is your opinion about it?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
The above text is entirely my own work written by me on the basis of my research.
In writing this text I did not use other sources or automatic text generation systems.
Copyright by Dariusz Prokopowicz
Relevant answer
Answer
The application of generative AI can significantly improve the existing applications of Big Data Analytics and increase their scale of application in carrying out analysis of processing large data sets, generating multi-criteria simulation models, and conducting predictive analysis and projections. By automating critical and time-consuming steps, such as feature engineering and model selection, generative AI can help non-experts to apply Big Data Analytics in a more efficient and effective manner. Additionally, generative AI can be used to generate synthetic data, multi-criteria simulation models, and probabilistic forecasts, which can provide organizations with a better understanding of complex and uncertain environments.
  • asked a question related to Data
Question
1 answer
Explore the integration of GraphQL subscriptions for efficient real-time communication in web development. Share insights on implementation, benefits, and potential challenges.
Relevant answer
Answer
Here's a comprehensive response to how GraphQL subscriptions enhance real-time data updates in web applications:
Key Concepts:
  • GraphQL Subscriptions: A powerful feature that enables real-time data synchronization between clients and servers. They allow clients to subscribe to specific events or data changes and receive updates as they occur, without constantly polling the server.
Enhancements for Real-Time Updates:
  1. Real-Time Data Propagation: Clients receive immediate updates when relevant data changes, ensuring a consistent and up-to-date user experience. This eliminates the need for manual page refreshes or polling mechanisms.
  2. Efficient Data Transfer: Clients only receive the specific data they've subscribed to, reducing network traffic and improving performance compared to traditional polling approaches.
  3. Unified Data Handling: GraphQL subscriptions integrate seamlessly with queries and mutations within the same API endpoint, simplifying development and maintaining a consistent data flow.
  4. Improved User Experience: Real-time updates create more dynamic and engaging user experiences, such as:Live chats and messaging Real-time notifications Live dashboards and feeds Collaborative editing Multiplayer games Live tracking (e.g., delivery status, ride-sharing)
  5. Scalability and Performance: GraphQL subscriptions can be efficiently managed using WebSockets or server-sent events (SSE), enabling effective scaling and handling of large numbers of concurrent connections.
Example Use Cases:
  • Chat Applications: Deliver real-time message delivery and notifications.
  • Live Scores and Updates: Push real-time sports scores, stock prices, or news feeds.
  • Collaborative Editing: Enable multiple users to work on a document simultaneously and see each other's changes instantly.
  • Social Media Feeds: Display real-time updates from friends and followers.
  • Live Product Availability Updates: Notify customers about stock changes and new product arrivals.
Implementation Considerations:
  • GraphQL Server Support: Choose a GraphQL server that supports subscriptions (e.g., Apollo Server, GraphQL Yoga).
  • Transport Protocol: Select a suitable transport protocol, typically WebSockets or SSE.
  • Subscription Handling: Implement subscription resolvers to handle subscription events and push updates to clients.
  • Client-Side Handling: Use a GraphQL client library that supports subscriptions (e.g., Apollo Client, Relay).
Conclusion:
GraphQL subscriptions provide a powerful and efficient mechanism for implementing real-time features in modern web applications. They offer significant benefits in terms of data efficiency, user experience, and development convenience, making them a valuable tool for building dynamic and engaging applications.
  • asked a question related to Data
Question
1 answer
Seeking insights on strategies and technologies to promote ethical handling of data within social platforms, balancing user privacy and societal well-being. Open to perspectives from experts in data ethics and social sciences.
Relevant answer
Answer
stay offline :-)
  • asked a question related to Data
Question
13 answers
The future of blockchain-based internet solutions
Blockchain is defined as a decentralized and distributed database in the open source model in a peer-to-peer internet network without central computers and without a centralized data storage space, used to record individual transactions, payments or journal entries encoded using cryptographic algorithms.
In current applications, blockchain is usually a decentralized and dispersed register of financial transactions. It is also a decentralized transaction platform in a distributed network infrastructure. In this formula, blockchain is currently implemented into financial institutions.
Some banks are already trying to use blockchain in their operations. if they did not do it, other economic entities, including fintechs, implementing blockchain could become more competitive in this respect. However, cryptocurrencies and a secure record of transactions are not the only blockchain applications. Various potential blockchain applications are being considered in the future.
Perhaps these new, different applications already exist in specific companies, corporations, public institutions or research centers in individual countries. In view of the above, the current question is: In what applications, besides cryptocurrency, blockchain in your company, organization, country, etc.?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
A. Abusukhon, Z. Mohammad, A. Al-Thaher (2021) An authenticated, secure, and mutable multiple-session-keys protocol based on elliptic curve cryptography and text_to-image encryption algorithm. Concurrency and computation practice and experience. [Science Citation Index].
A. Abusukhon, N. Anwar, M. Mohammad, Z., Alghanam, B. (2019) A hybrid network security algorithm based on Diffie Hellman and Text-to-Image Encryption algorithm. Journal of Discrete Mathematical Sciences and Cryptography. 22(1) pp. 65- 81. (SCOPUS). https://www.tandfonline.com/doi/abs/10.1080/09720529.2019.1569821A. Abusukhon, B.Wawashin, B. (2015) A secure network communication protocol based on text to barcode encryption algorithm. International Journal of Advanced Computer Science and Applications (IJACSA). (ISI indexing). https://thesai.org/Publications/ViewPaper?Volume=6&Issue=12&Code=IJACSA&Seri alNo=9
A. Abusukhon, Talib, M., and Almimi, H. (2014) Distributed Text-to-Image Encryption Algorithm. International Journal of Computer Applications (IJCA), 106 (1). [ available online at : https://www.semanticscholar.org/paper/Distributed-Text-to-Image-Encryption-Algorithm-Ahmad-Mohammad/0764b3bd89e820afc6007b048dac159d98ba5326]
A. Abusukhon (2013) Block Cipher Encryption for Text-to-Image Algorithm. International Journal of Computer Engineering and Technology (IJCET). 4(3) , 50-59. http://www.zuj.edu.jo/portal/ahmad-abu-alsokhon/wpcontent/uploads/sites/15/BLOCK-CIPHER-ENCRYPTION-FOR-TEXT-TO-IMAGE ALGORITHM.pdf
A. Abusukhon, Talib, M. and Nabulsi, M. (2012) Analyzing the Efficiency of Text-to-Image Encryption Algorithm. International Journal of Advanced Computer Science and Applications ( IJACSA )(ISI indexing) , 3(11), 35 – 38. https://thesai.org/Publications/ViewPaper?Volume=3&Issue=11&Code=IJACSA&Seri alNo=6
A. Abusukhon, Talib M., Issa, O. (2012) Secure Network Communication Based on Text to Image Encryption. International Journal of Cyber-Security and Digital Forensics (IJCSDF), 1(4). The Society of Digital Information and Wireless Communications (SDIWC) 2012. https://www.semanticscholar.org/paper/SECURENETWORK-COMMUNICATION-BASED-ON-TEXT-TO-IMAGE-Abusukhon-Talib/1d122f280e0d390263971842cc54f1b044df8161
  • asked a question related to Data
Question
1 answer
Dear all,
I am trying to do some criterion and construct validity exercises with Dutch data.
Do you know any data or studies which use the PVQ-RR or PVQ-40 or other detailed measurements?
Thanks,
Oscar
Relevant answer
Answer
ÖZGÜR, Ergün. 2021. “Individual Values and Acculturation Processes of Immigrant Groups from Turkey: Belgium, Germany, and the Netherlands.” Journal of Identity and Migration Studies 15(1).
  • asked a question related to Data
Question
3 answers
How to build a Big Data Analytics system based on artificial intelligence more perfect than ChatGPT that learns but only real information and data?
How to build a Big Data Analytics system, a Big Data Analytics system, analysing information taken from the Internet, an analytics system based on artificial intelligence conducting real-time analytics, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
Well, ChatGPT is not perfect in terms of self-learning new content and perfecting the answers it gives, because it happens to give confirmation answers when there is information or data that is not factually correct in the question formulated by the Internet user. In this way, ChatGPT can learn new content in the process of learning new but also false information, fictitious data, in the framework of the 'discussions' held. Currently, various technology companies are planning to create, develop and implement computerised analytical systems based on artificial intelligence technology similar to ChatGPT, which will find application in various fields of big data analytics, will find application in various fields of business and research work, in various business entities and institutions operating in different sectors and industries of the economy. One of the directions of development of this kind of artificial intelligence technology and applications of this technology are plans to build a system of analysis of large data sets, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on artificial intelligence conducting analytics in real time, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data. Some of the technology companies are already working on this, i.e. on creating this kind of technological solutions and applications of artificial intelligence technology similar to ChatGPT. But presumably many technology start-ups that plan to create, develop and implement business specific technological innovations based on a specific generation of artificial intelligence technology similar to ChatGPPT are also considering undertaking research in this area and perhaps developing a start-up based on a business concept of which technological innovation 4.0, including the aforementioned artificial intelligence technologies, is a key determinant.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How to build a Big Data Analytics system, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on Artificial Intelligence conducting real-time analytics, integrated with an Internet search engine, but an Artificial Intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
This is a very complex question but I will try to synthesize my main points into what I consider is the main problem with LLMs and my perceived solution.
One of the underlying problems with LLMs is the problem of hallucinations and the wrong answers it generates. This has its roots on two subproblems. The first is the data and its training, the second is in the nature of the algorithms and the assumption of graceful degradation. I think that the first one is easy to solve by not throwing junk data and expecting that 'statistical miracles' occur and bubble up truth from noise. That is a nice mathematical hallucination on our part (no amount of mathematical Platonism can compete with the messy "mundane" day to day ). There is no replacement for hard work to sort out good data from bad one.
The second problem is the one that is more difficult to solve. It lies on several assumptions that are ingrained in neural networks. Neural networks promised graceful degradation, but in reality we need neural networks to abstain from graceful degradation in critical situations. Hallucination is based on this philosophical flaw of neural networks. The graceful degradation relies on distributed representations and the assumption that even thought the whole representation is not present, if there is enough of a representation it will output the complete representation. This is an extremely strong assumption to embrace as a universal case for all data. This is by necessity an existential case and not a universal one. A possible solution to this is to use an ensemble of algorithms that contain neural and non neural algorithms and the consensus wins.
In my view, both curation of primary data for foundational models and the consensus of algorithms is necessary (but not sufficient) to achieve a better system. I would also tackle how to realize these two solutions as a separate thread for each one.
Regards
  • asked a question related to Data
Question
5 answers
Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
Are there any examples of artificial intelligence applications in this area of large company operations?
In large corporations characterized by a complex, multi-level organizational structure, the flow of information can be difficult. New ICT and Industry 4.0 information technologies are proving to be helpful in this regard, improving the efficiency of the flow of information flowing between departments and divisions in the corporation. One of the Industry 4.0 technologies that has recently found various new applications is artificial intelligence. Artificial intelligence technology is finding many new applications in recent years. The implementation of artificial intelligence, machine learning and other Industry 4.0 technologies into various business fields of companies, enterprises and financial institutions is associated with the increase in digitization and automation of processes carried out in business entities. For several decades, in order to refine and improve the flow of information in a corporation characterized by a complex organizational structure, integrated information systems are being implemented that informationally connect applications and programs operating within specific departments, divisions, plants, etc. in a large enterprise, company, corporation. Nowadays, a technology that can help optimize remote communication and information flow in a corporation is artificial intelligence. Artificial intelligence can help optimize information flow and data transfer within a corporation's intranet.
Besides, the technologies of Industry 4.0, including artificial intelligence, can help improve the cyber security techniques of data transfer, including that carried out in email communications.
In view of the above, I address the following question to the esteemed community of researchers and scientists:
Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Relevant answer
Answer
AI can greatly improve distant communication and information flow in huge organisations with complicated, multi-level organisational systems. AI's ability to process massive amounts of data, recognise patterns, and automate repetitive operations makes it perfect for improving communication in such contexts. Here are several ways AI can help, along with my opinion:
AI for Corporate Communication Optimisation:
AI can analyse large amounts of data from numerous sources within a firm, synthesising and summarising critical information to improve decision-making processes.
2. Improved Email Filtering and Prioritization: - AI algorithms sort and prioritise emails, assuring timely delivery of vital information.
3. Chatbots and Virtual Assistants: - AI-powered chatbots and virtual assistants answer routine inquiries, freeing up human resources for complicated work and enhancing communication efficiency.
4. Predictive Analytics for Decision Making: - AI can analyse corporate data patterns to enhance proactive decision-making and strategic planning.
5. NLP for Content Analysis: - NLP may analyse internal communication, extract sentiments, find trends, and identify possible issues or disputes.
6. Customised Information Feeds: - AI may customise information feeds for employees depending on their positions, interests, and projects, ensuring relevant information dissemination.
7. Enhancing Remote Meetings: - AI tools provide real-time transcription, translation, summarization, and action item tracking, improving meeting quality.
AI Examples in Large Corporations:
IBM Watson helps organisations optimise communication and operational efficiency with data analysis and decision support.
Microsoft AI provides predictive analytics and automated job management for complicated organisations.
Personal Opinion:
AI should empower people by automating routine jobs and offering insightful data, not replacing human judgement and decision-making.
To Address Challenges: Integrating AI into complicated organisational hierarchies requires data privacy, system integration, and employee digital literacy.
Ethics: As AI becomes more incorporated into corporate systems, data use and employee surveillance must be carefully managed.
Constant Change: AI is quickly evolving, and its applications in business communication will expand, giving new optimisation and efficiency opportunities.
In conclusion:
AI has huge potential to transform corporate communication and information flow. AI can streamline complicated organisational operations by automating regular tasks, delivering actionable insights, and improving communication channels. It must be implemented carefully, taking into account integration issues, personnel training, and ethical issues.
  • asked a question related to Data
Question
2 answers
I´m doing my research about determinants of houses prices in dublin. I need a DataBase to study. Where i can find that?
  • asked a question related to Data
Question
3 answers
Greetings, fortunately recently my paper was published which were regarding open data, however, I intend to establish a new research project now and collaborate with other researchers, preferably from other countries, and intend to have a look at open data from a legal lens. What steps should I take to expand my academic reach and find coauthors for my future research project?
Thank you
Relevant answer
Answer
Arash Moghadasi Wa Alaikum Salaam, congratulations on your publication! Expanding academic connections and reach is key for enriching future research. Some suggestions:
- Attend conferences and workshops in your field, particularly those focused on interdisciplinary discourse. Networking in-person can spark organic collaborations. Exchange contact information.
- Search literature databases like Scopus for authors working on related open data topics. Reach out by email to discuss shared interests.
- Get involved with academic associations, working groups, online forums related to open data. Position yourself as an expert.
- Consider a visiting scholar position at an institution strong in your research area. Experience new environments.
- Publish mini-commentaries on trends or open problems to increase visibility. Subscribe to relevant journals to stay abreast of latest work to cite.
- Search funding opportunities for international collaborative projects. This can drive team formation.
- Approach scholars at conferences or by email seeking advice to get on their radar. Ask about needs for research assistance or writing.
- Use social media professionally to find researchers and share your own work. LinkedIn and Twitter are valuable here.
I wish you the utmost success in this endeavor. Please keep me updated, and feel free to ask any other questions that arise!
  • asked a question related to Data
Question
4 answers
Dear Researchers, I am looking for open-source Gravity/Magnetic data for interpretations via Oasis montaj Software and Voxi Earth Modeling. Please specify some sources where form the data is easily accessible.
Regards,
Ayaz
Relevant answer
Answer
Check the NGU (Geological Survey of Norway) website.
You can download most of our magnetic surveys for free.
  • asked a question related to Data
Question
1 answer
Greetings, I hope everyone is having a great day. Just wanted to share my joy of my paper being published in the field of open data and economics and would be happy to share the work which is on "https://doi.org/10.1007/s13132-023-01518-z".
Also, welcome any feedback or opinions from fellow colleagues and researchers.
Relevant answer
Answer
Arash Moghadasi Congratulations on your publication. Sounds interesting and a useful contribution to knowledge.
  • asked a question related to Data
Question
3 answers
Data generation (collection) is a key and critical component of a qualitative research project. The question is, how can one make sure that sufficient data have been generated/collected?
Relevant answer
Answer
The very simple way to understand that you have collected sufficient data is that when you got the same answer from the respondent again and again, i mean you are not getting any new information form sespondents.
  • asked a question related to Data
Question
1 answer
How far is it acceptable to do a review work using the data extracted from Dimensions database, especially for publication in a reputed journal?
Relevant answer
Answer
No database is fully comprehensive, but Dimensions comes pretty close. It has far higher quality than Google Scholar that indexes numerous predatory journals and even undergraduate research papers. This page describes its data sources: https://www.dimensions.ai/dimensions-data/
For an open access database, Dimensions is the best in size and quality.
  • asked a question related to Data
Question
3 answers
If ChatGPT is merged into search engines developed by internet technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be involved?
Leading Internet technology companies that also have and are developing search engines in their range of Internet information services are working on developing technological solutions to implement ChatGPT-type artificial intelligence into these search engines. Currently, there are discussions and considerations about the social and ethical implications of such a potential combination of these technologies and offering this solution in open access on the Internet. The considerations relate to the possible level of risk of manipulation of the information message in the new media, the potential disinformation resulting from a specific algorithm model, the disinformation affecting the overall social consciousness of globalised societies of citizens, the possibility of a planned shaping of public opinion, etc. This raises another issue for consideration concerning the legitimacy of creating a control institution that will carry out ongoing monitoring of the level of objectivity, independence, ethics, etc. of the algorithms used as part of the technological solutions involving the implementation of artificial intelligence of the ChatGPT type in Internet search engines, including those search engines that top the rankings of Internet users' use of online tools that facilitate increasingly precise and efficient searches for specific information on the Internet. Therefore, if, however, such a system of institutional control on the part of the state is not established, if this kind of control system involving companies developing such technological solutions on the Internet does not function effectively and/or does not keep up with the technological progress that is taking place, there may be serious negative consequences in the form of an increase in the scale of disinformation realised in the new Internet media. How important this may be in the future is evident from what is currently happening in terms of the social media portal TikTok. On the one hand, it has been the fastest growing new social medium in recent months, with more than 1 billion users worldwide. On the other hand, an increasing number of countries are imposing restrictions or bans on the use of TikTok on computers, laptops, smartphones etc. used for professional purposes by employees of public institutions and/or commercial entities. It cannot be ruled out that new types of social media will emerge in the future, in which the above-mentioned technological solutions involving the implementation of ChatGPT-type artificial intelligence into online search engines will find application. Search engines that may be designed to be operated by Internet users on the basis of intuitive feedback and correlation on the basis of automated profiling of the search engine to a specific user or on the basis of multi-option, multi-criteria search controlled by the Internet user for specific, precisely searched information and/or data. New opportunities may arise when the artificial intelligence implemented in a search engine is applied to multi-criteria search for specific content, publications, persons, companies, institutions, etc. on social media sites and/or on web-based multi-publication indexing sites, web-based knowledge bases.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
If ChatGPT is merged into search engines developed by online technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be associated with this?
What is your opinion on the subject?
What do you think about this topic?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
If tools such as ChatGPT, after the necessary update and adaptation to current Internet technologies, are combined with search engines developed by Internet technology companies, search results can be shaped by certain complex algorithms, by generative artificial intelligence learned to use and improve complex models for advanced intelligent search of precisely defined topics, intelligent search systems based on artificial neural networks and deep learning. If such solutions are created, it may involve the risk of deliberate shaping of algorithms of advanced Internet search systems, which may generate the risk of interference and influence of Internet search engine technology companies on search results and thus shaping the general social awareness of citizens on specific topics.
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
  • asked a question related to Data
Question
3 answers
Suppose I want to analyse 100 year of monthly mean flow data of a particular month.
Relevant answer
Answer
For skewed data, it is better to do a log-transform first to see if it improves the assumption requirement, normally needed for many changepoint algorithms. Here is an example showing the log-tranformation of the covid infection time series:https://stats.stackexchange.com/questions/434990/changepoint-analysis-with-missing-data
  • asked a question related to Data
Question
10 answers
These are few questions for your reference,
How much did you learn about managing your money from your parents?
· None
· Hardly at all
· Little
· Some
A lot
How often were you influenced by or did you discuss about finances with your parents?
· Never
· Once a year
· Every few months
· Twice a month
Weekly
What is your current investment amount in stocks/shares? (Portfolio value)
· 1 - 90,000
· 90,000–170,000
· 170,000–260,000
· 260,000–340,000
· More than 340,000
The above questions are allocated weights from 1 to 5.
Relevant answer
Answer
You can set ordinal type variable for analysis in SPSS
  • asked a question related to Data
Question
9 answers
I was exploring differential privacy (DP) which is an excellent technique to preserve the privacy of the data. However, I am wondering what will be the performance metrics to prove this between schemes with DP and schemes without DP.
Are there any performance metrics in which a comparison can be made between scheme with DP and scheme without DP?
Thanks in advance.
Relevant answer
Answer
  1. Epsilon (ε): The fundamental parameter of differential privacy that quantifies the amount of privacy protection provided. Smaller values of ε indicate stronger privacy guarantees.
  2. Delta (δ): Another parameter that accounts for the probability that differential privacy might be violated. Smaller values of δ indicate lower risk of privacy breaches.
  3. Accuracy: Measures how much the output of a differentially private query deviates from the non-private query output. Lower accuracy indicates more noise added for privacy preservation.
  4. Utility: Assesses how well the data analysis task can be accomplished while maintaining differential privacy. Higher utility implies less loss of useful information.
  5. False Positive Rate: In the context of hypothesis testing, it's the probability of incorrectly identifying a sensitive individual as not being in the dataset.
  6. False Negative Rate: The probability of failing to identify a sensitive individual present in the dataset.
  7. Sensitivity: Defines the maximum impact of changing one individual's data on the query output. It influences the amount of noise introduced for privacy.
  8. Data Reconstruction Error: Measures how well an adversary can reconstruct individual data points from noisy aggregated results.
  9. Risk of Re-identification: Measures the likelihood that an attacker can associate a specific record in the released data with a real individual.
  10. Privacy Budget Depletion: Tracks how much privacy budget (ε) is consumed over multiple queries, potentially leading to eventual privacy leakage.
  11. Trade-off Between Privacy and Utility: Evaluates the balance between privacy gains and the degradation of data quality or analysis accuracy.
  12. Adversarial Attack Resistance: Assessing the effectiveness of differential privacy against adversaries attempting to violate privacy by exploiting the noise added to the data.
  • asked a question related to Data
Question
3 answers
Hi,
I am trying to export the XPS data from CASA software to ASCII format but getting an error stating "Demo version: Data not saved". Please suggest how to do it.
Relevant answer
Answer
Generally in demo versions, which are only there for demonstration purposes, you can use the tool of the software, but you can't save your work, copy paste them to another platform, etc.
Basically you can't make your work exist outside of the software from the moment it's done, till you close the window.
If you're getting that message, it means that you've installed the demo version.
  • asked a question related to Data
Question
1 answer
Hi All!
I am looking for occurrence data for these species that aren't found in the typical places. If anyone has any data on these species that they would be willing to share for acknowledgements, citation, etc., please reach out to me via DM.
Thanks!
EV
Relevant answer
Answer
What do you mean by "not typical places"? I have seen B. pennsylvanicus in central PA
  • asked a question related to Data
Question
3 answers
In your opinion, could a new generation of generative artificial intelligence be created in the future, whereby a highly sophisticated language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
For example, if in a few years it will be possible to create a kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and integrate a language model with a database of data, knowledge, etc. derived from publications written by that person and documented statements, previously given interviews then perhaps in a few years it will be possible to talk to a kind of artificial consciousness simulating the consciousness of a specific person who has long since died and would answer questions simulating that person, e.g. the long-dead Albert Einstein. In this way, there could be language models available on the Internet based on generative artificial intelligence equipped with artificial thought processes, artificial consciousness, with the knowledge of a specific person with whom the Internet user could converse. If this kind of highly intelligent tools were created and offered as a service to talk to a specific Person, known and living many years ago, this kind of service could probably become very popular as a new Internet service. However, the question of ethics and possible copyright of works, publications, books written by a specific person many years ago and whose knowledge, data and information would be used by a generative artificial intelligence simulating the consciousness of this person and answering questions, participating in discussions with people, with Internet users, remains to be considered. Beyond this, however, there is a specific category of risk of disinformation within this kind of online service that could be created in the future. This risk of disinformation would occur if there were situations of responses given by artificial intelligence to questions posed by humans, which would contain content, information, data, wording, phrases, suggestions, etc., which would never be uttered by a specific person simulated by artificial intelligence. The level of this kind of risk of misinformation would be inversely proportional to and determined by the level of sophistication, perfection, etc. of the construction of this kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and the integration of a linguistic model with a database of data, knowledge, etc. derived from publications written by this person and documented statements, previously given interviews, etc., and the perfection of the learning system to give increasingly perfect answers given to the questions asked and to learn by the generative artificial intelligence system to actively participate in discussions.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
In your opinion, can a new generation of generative artificial intelligence be created in the future, whereby a highly advanced language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
Could an artificial intelligence be created in the future that is capable of simulating the consciousness of a specific person?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Counting on your opinions, on getting to know your personal opinion, on an honest approach to discussing scientific issues and not the ready-made answers generated in ChatGPT, I deliberately used the phrase "in your opinion" in the question.
The above text is entirely my own work written by me on the basis of my research.
I have not used other sources or automatic text generation systems such as ChatGPT in writing this text.
Copyright by Dariusz Prokopowicz
Best wishes,
Dariusz Prokopowicz
  • asked a question related to Data
Question
3 answers
I am trying to run a spatio-temporal autoregressive model (STAR). Therefore I need to create a spatial weight matrix W with N × T rows and N × T columns to weight country interdependencies based on yearly trade data. Could someone please tell me how I to create such a matrix in R or Stata?
Relevant answer
Answer
Dear Jan,
OK ! I see ! you need to create the spatial weight matrix indeed !
There are many possibilities in R:
I strongly advise to work with sf because it so easier now !
but spdp may still be clearly adaptated to you context :
This is one of the definitive books on the subject in R:
there are other references but they are more geospatial (point process) oriented.
Here you should use one of those packages and that nb2mat from package spdp might do the trick !
All the best.
Franck.
  • asked a question related to Data
Question
3 answers
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow, that we need something, that we should perhaps buy something we think we need?
Can an AI-equipped internet robot using the results of research carried out by Big Data advanced socio-economic analytics systems and employed in the call centre department of a company or institution already forecast, in real time, the consumption and purchase needs of a specific internet user on the basis of a conversation with a potential customer and, on this basis, offer internet users the purchase of an offer of products or services that they themselves would probably think they need in a moment?
On the basis of analytics of a bank customer's purchases of products and services, analytics of online payments and settlements and bank card payments, will banks refine their models of their customers' purchase preferences for the use of specific banking products and financial services? for example, will the purchase of a certain type of product or service result in an offer of, for example, a specific insurance or bank loan to a specific customer of the bank?
Will this be an important part of the automation of the processes carried out within the computerised systems concerning customer relations etc. in the context of the development of banking in the years to come?
For years, in databases, data warehouses and Big Data platforms, Internet technology companies have been collecting information on citizens, Internet users, customers using their online information services.
Continuous technological progress increases the possibilities of both obtaining, collecting and processing data on citizens in their role as potential customers, consumers of Internet offers and other media, Internet information services, offers of various types of products and services, advertising campaigns that also influence the general social awareness of citizens and the choices people make concerning various aspects of their lives. The new Industry 4.0 technologies currently being developed, including Big Data Analytics, cloud computing, Internet of Things, Blockchain, cyber security, digital twins, augmented reality, virtual reality and also machine learning, deep learning, neural networks and artificial intelligence will determine the rapid technological progress and development of applications of these technologies in the field of online marketing in the years to come as well. The robots being developed, which collect information on specific content from various websites and webpages, are able to pinpoint information written by internet users on their social media profiles. In this way, it is possible to obtain a large amount of information describing a specific Internet user and, on this basis, it is possible to build up a highly accurate characterisation of a specific Internet user and to create multi-faceted characteristics of customer segments for specific product and service offers. In this way, digital avatars of individual Internet users are built in the Big Data databases of Internet technology companies and/or large e-commerce platforms operating on the Internet, social media portals. The descriptive characteristics of such avatars are so detailed and contain so much information about Internet users that most of the people concerned do not even know how much information specific Internet-based technology companies, e-commerce platforms, social media portals, etc. have about them.
Geolocalisation added to 5G high-speed broadband and information technology and Industry 4.0 has, on the one hand, made it possible to develop analytics for identifying Internet users' shopping preferences, topics of interest, etc., depending on where, specifically geographically, they are at any given time with the smartphone on which they are using certain online information services. On the other hand, the combination of the aforementioned technologies in the various applications developed in the applications installed on the smartphone has made it possible, on the one hand, to increase the scale of data collection on Internet users, and, on the other hand, also to increase the efficiency of the processing of this data and its use in the marketing activities of companies and institutions and the implementation of these operations increasingly in real time in the cloud computing, the presentation of the results of the data processing operations carried out on Internet of Things devices, etc.
It is becoming increasingly common for us to experience situations in which, while walking with a smartphone past some physical shop, bank, company or institution offering certain services, we receive an SMS, banner or message on the Internet portal we have just used on our smartphone informing us of a new promotional offer of products or services of that particular shop, company, institution we have passed by.
In view of the above, I would like to address the following question to the esteemed community of scientists and researchers:
Is analytics based on Big Data and artificial intelligence, conducted in the field of market research, market analysis, the creation of characteristics of target customer segments, already able to forecast what we will think about tomorrow, that we need something, that we might need to buy something that we consider necessary?
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow?
The text above is my own, written by me on the basis of my research.
In writing this text, I did not use other sources or automatic text generation systems such as ChatGPT.
Copyright by Dariusz Prokopowicz
What do you think about this topic?
What is your opinion on this subject?
Please answer,
I invite you all to discuss,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Relevant answer
Answer
Predicting individual thoughts and opinions is a complex task due to the inherent complexity and subjectivity of human cognition. Thoughts and opinions are influenced by a myriad of factors such as personal experiences, values, beliefs, cultural background, and individual idiosyncrasies. These factors create a highly nuanced and dynamic landscape that is challenging to capture accurately through data analysis alone.
While analytics based on Big Data and AI can provide valuable insights into general trends and patterns, predicting individual thoughts requires a deep understanding of the context and personal factors that shape an individual's thinking. AI algorithms typically analyze historical data to identify correlations and patterns, which can be useful in predicting collective behavior or trends at a broader level. For example, analyzing social media data can help identify sentiments about a particular topic within a given population.
However, predicting individual thoughts requires accounting for unique and specific circumstances that can significantly impact an individual's perspectives. These circumstances may not be adequately captured in the available data sources or may change rapidly over time. Furthermore, individual thoughts and opinions are not solely influenced by external factors but are also shaped by internal cognitive processes that can be highly subjective and difficult to quantify.
Another challenge lies in the interpretability of AI algorithms. While AI can make predictions based on complex models, explaining how those predictions were generated can be challenging. This lack of interpretability makes it difficult to gain a deep understanding of the underlying factors influencing individual thoughts and opinions, limiting the reliability and trustworthiness of such predictions.
It is important to note that the field of AI is rapidly advancing, and new techniques and approaches are continually emerging. Researchers are working on developing more sophisticated models that can better capture and understand human cognition. However, the ability to predict individual thoughts with complete accuracy still remains a significant challenge.
In summary, while analytics based on Big Data and AI can provide valuable insights and predictions at a collective level, accurately predicting individual thoughts and opinions is a complex task due to the multifaceted nature of human cognition and the limitations of available data sources. While advancements are being made, predicting individual thoughts with certainty remains beyond the current capabilities of AI.
  • asked a question related to Data
Question
4 answers
My team and I are trying to open a dialogue about designing a Continuum of Realism for synthetic data. We want to develop a meaningful way to talk about data in terms of the degree of realism that is necessary for a particular task. We feel the way to do this is by defining a continuum that shows that as data becomes more realistic, the analytic value increases, but so does the cost and risk of disclosure. Everyone seems to be interested in generating the most realistic data, but let's be honest, sometimes that's not the level of realism that we actually need. It is expensive and carries a high reidentification risk when working with PII. Sometimes we just need data to test our code, and we can't justify using this level of realism when the risk is so high. Have you also encountered this issue? Are you interested in helping us fulfill our mission? Ultimately we are trying to save money and protect consumer privacy. We would love to hear your thoughts!
Relevant answer
Answer
Yes, there is a continuum of realism for synthetic data. At one end of the continuum, we have completely synthetic data that is generated based on mathematical models or simulations. This type of data can be useful for testing hypotheses, exploring different scenarios, and evaluating methods without the constraints and biases of real-world data. However, it may not reflect the complexity and diversity of real-world data, and may not be useful for certain applications, such as training machine learning models.
At the other end of the continuum, we have real-world data that is collected directly from sources such as surveys, medical records, or social media platforms. This type of data can provide a rich and diverse representation of the phenomena of interest but may be limited by factors such as sample size, data quality, and ethical considerations.
Between these two extremes, we have various levels of realism that can be achieved through the use of synthetic data. For example, data may be generated based on real-world data using methods such as data augmentation or data synthesis, which can create new data points that are similar to the real data but with some degree of randomness or variability. Alternatively, data may be generated based on simulations or generative models that incorporate known properties of the real-world data, such as distributional properties or relationships between variables.
As for your second question, as an AI language model, I am always ready to provide help and guidance on topics related to synthetic data and statistics. Please let me know if there is anything specific that I can assist you with.
  • asked a question related to Data
Question
4 answers
Hi everyone,
I need to convert standard error (SE) into standard deviation (SD). The formula for that is
SE times the square root of the sample size
By 'sample size', does it mean the total sample size or sample sizes of individual groups? For example, the intervention group has 40 participants while the control group has 39 (so the total sample size is 79) So, when calculating SD for the intervention group, do I use 40 as the sample size or 79?
Thank you!
Relevant answer
Answer
7.7.3.2 Obtaining standard deviations from standard errors and (cochrane.org)
also, there is useful calculator in the attached Excel file from Cochrane.
  • asked a question related to Data
Question
13 answers
How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that new startups that are planning to develop implementing innovative business solutions, technological innovations, environmental innovations, energy innovations and other types of innovations?
The economic development of a country is determined by a number of factors, which include the level of innovativeness of economic processes, the creation of new technological solutions in research and development centres, research institutes, laboratories of universities and business entities and their implementation into the economic processes of companies and enterprises. In the modern economy, the level of innovativeness of the economy is also shaped by the effectiveness of innovation policy, which influences the formation of innovative startups and their effective development. The economic activity of innovative startups generates a high investment risk and for the institution financing the development of startups this generates a high credit risk. As a result, many banks do not finance business ventures led by innovative startups. As part of the development of systemic financing programmes for the development of start-ups from national public funds or international innovation support funds, financial grants are organised, which can be provided as non-refundable financial assistance if a startup successfully develops certain business ventures according to the original plan entered in the application for external funding. Non-refundable grant programmes can thus activate the development of innovative business ventures carried out in specific areas, sectors and industries of the economy, including, for example, innovative green business ventures that pursue sustainable development goals and are part of green economy transformation trends. Institutions distributing non-returnable financial grants should constantly improve their systems of analysing the level of innovativeness of business ventures planned to be implemented by startups described in applications for funding as innovative. As part of improving systems for verifying the level of innovativeness of business ventures and the fulfilment of specific set goals, e.g. sustainable development goals, green economy transformation goals, etc., new Industry 4.0 technologies implemented in Business Intelligence analytical platforms can be used. Within the framework of Industry 4.0 technologies, which can be used to improve systems for verifying the level of innovativeness of business ventures, machine learning, deep learning, artificial intelligence (including e.g. ChatGPT), Business Intelligence analytical platforms with implemented Big Data Analytics, cloud computing, multi-criteria simulation models, etc., can be used. In view of the above, in the situation of having at one's disposal appropriate IT equipment, including computers equipped with new generation processors characterised by high computing power, it is possible to use artificial intelligence, e.g. ChatGPT and Big Data Analytics and other Industry 4.0 technologies to analyse the level of innovativeness of new economic projects that plan to develop new start-ups implementing innovative business solutions, technological, ecological, energy and other types of innovations.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that plan to develop new startups implementing innovative business solutions, technological innovations, ecological innovations, energy innovations and other types of innovations?
What do you think?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
Enhancements to Tableau for Slack focuses on sharing, search and insights with automated workflows for tools like Accelerator. The goal: empower decision makers and CRM teams to put big data to work...
The changes also presage what’s coming next: integration of recently announced generative AI model Einstein GPT, the fruit of Salesforce’s collaboration with ChatGPT maker OpenAI, with natural language-enabled interfaces to make wrangling big data a low-code/no-code operation...
  • asked a question related to Data
Question
3 answers
Dear researchers,
I am working on a project related to solar wind. I want to download a 1-minute resolution data from a BepiColombo spacecraft. However, I am struggling with that. Do you know any websites to download the data? OR, If you could help me to provide a BepiColombo data just for few days, it would be very helpful. I am expecting a valuable comments from a wonderful personalities.
Many thanks.
Relevant answer
Answer
Gotcha.
Umph. And Ulysses data are too old and Cluster's are in the wrong place...
I'd drop Dr. Heyner a line:
  • asked a question related to Data
Question
3 answers
Does analytics based on sentiment analysis of changes in Internet user opinion using Big Data Analytics help detect fakenews spread as part of the deliberate spread of disinformation on social media?
The spread of disinformation on social media used by setting up fake profiles and spreading fakenews on these media is becoming increasingly dangerous in terms of the security of not only specific companies and institutions but also the state. The various social media, including those dominating this segment of new online media, however, differ considerably in this respect. The problem is more acute in the case of those social media which are among the most popular and on which mainly young people function, whose world view can be more easily influenced by factual information and other disinformation techniques used on the Internet. Currently, among children and young people, the most popular social media include Tik Tok, Instagram and YouTube. Consequently, in recent months, the development of some social media sites such as Tik Tok is already being restricted by the governments of some countries by banning the use, installation of this application of this portal on smartphones, laptops and other devices used for official purposes by employees of public institutions. These actions are argued by the governments of these countries in order to maintain a certain level of cyber security and reduce the risk of surveillance, theft of data and sensitive, strategic and particularly security-sensitive information of individual institutions, companies and the state. In addition, there have already been more than a few cases of data leaks on other social media portals, telecoms, public institutions, local authorities and others based on hacking into the databases of specific institutions and companies. In Poland, however, the opposite is true. Not only does the organised political group PIS not restrict the use of Tik Tok by employees of public institutions, but it also motivates the use of this portal by politicians of the ruling PIS option to publish videos as part of the ongoing electoral campaign, which would increase the chances of winning parliamentary elections for the third time in autumn this year 2023. According to analysts researching the problem of growing disinformation on the Internet, in highly developed countries it is enough to create 100 000 avatars, i.e. non-existent fictitious persons, created as it were and seemingly functioning thanks to the Internet by creating profiles of these fictitious persons on social media portals referred to as fake profiles created and functioning on these portals, to seriously influence the world view, the general social awareness of Internet users, i.e. usually the majority of citizens in the country. On the other hand, in third world countries, in countries with undemocratic systems of power, all that is needed for this purpose is about 1,000 avatars of these fictitious people with stories modelled, for example, on famous people such as, in Poland, a well-known singer claiming that there is no pandemic and that vaccines are an instrument for increasing control of citizens by the state. The analysis of changes in the world view of Internet users, changes in trends concerning social opinion on specific issues, evaluations of specific product and service offers, brand recognition of companies and institutions can be conducted on the basis of sentiment analysis of changes in the opinion of Internet users using Big Data Analytics. Consequently, this type of analytics can be applied and of great help in detecting factual news disseminated as part of the deliberate spread of disinformation on social media.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Does analytics based on sentiment analysis of changes in the opinions of Internet users using Big Data Analytics help in detecting fakenews spread as part of the deliberate spread of disinformation on social media?
What is your opinion on this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
Yes, sentiment analysis based on Big Data Analytics can help in detecting fake news spread as part of the deliberate spread of disinformation on social media. Sentiment analysis involves the use of natural language processing and machine learning techniques to analyze large amounts of textual data, such as social media posts, to identify the sentiment expressed in the text. By analyzing changes in the sentiment of Internet users towards a particular topic or event, it is possible to identify patterns of misinformation and disinformation.
For example, if there is a sudden surge in negative sentiment towards a particular politician or political party, it could be an indication of a disinformation campaign aimed at spreading negative propaganda. Similarly, if there is a sudden increase in positive sentiment towards a particular product or service, it could be an indication of a paid promotion or marketing campaign.
However, it is important to note that sentiment analysis alone may not be enough to detect fake news and disinformation. It is also important to consider other factors such as the source of the information, the credibility of the information, and the context in which the information is being shared. Therefore, a comprehensive approach involving multiple techniques and tools may be necessary to effectively detect and combat fake news and disinformation on social media.
  • asked a question related to Data
Question
6 answers
I am tiding up with the below problem, it's a pleasure to have your ideas.
I've written a coding program in two languages, Python and R, but each came to a completely different result. Before jumping to a conclusion, I declare that:
- Every word of the code in two languages has multiple checks and is correct and represents the same thing.
- The used packages in two languages are the same version.
So, what do you think?
The code is about applying deep neural networks for time series data.
Relevant answer
Answer
Good morning, without the code it is difficilt to know where is the difference I do not use Python i work on R but maybe these difference is due to the stage of spitting dataset do you try to add thr same number in the count of generator of randomly for example seed(1234) (if my memory is good this function is also used in Python language. Were your results and metrics of evaluation totally different? In this case, mayve there is a reliability issue in your model. You should check your data preparation and features selection .
  • asked a question related to Data
Question
3 answers
Where can I get the original paper by Ackoff from 1989 titled "From data to wisdom" published in Journal of applied systems analysis?
The file that is publicly available under this title is a 2-page article from 1999.
Relevant answer
Answer
See enclosed file. I think this is what you are looking for.
Best regards.
  • asked a question related to Data
Question
13 answers
I am currently using Fisher's exact test as some of my cell counts are <5. I have done this for lots of data in the same dataset which have generally been 2x3 or more (so reported Cramer's V as well), however now I am running 2x2, the Fisher's output is blank and I can't figure out why?! I have attached an example of the output - any help would be gratefully received!
Relevant answer
Answer
Daniel Wright Sal Mangiafico thank you both for your help! Think I was too immersed in what I was doing that I hadn’t considered I would only be reporting the p-values for Fishers! The previous outputs with values on the Fishers row had thrown me and my common sense!! Really appreciate your help
  • asked a question related to Data
Question
10 answers
Since statistics is the golden key to interpret data in almost all scientific & social branches !
Relevant answer
Answer
Am in support of your proposal Sinan Ibaguner but such a course needs the orientation framework of scientific methodology and, of course, data science. Commanding such a course, defining some textbook and following examinations is more an administrative task; I stay with Gauss, that we need to understand the underlying mathematical idea of a measurement, before we execute the mechanical operation of interpreting data.
  • asked a question related to Data
Question
3 answers
I would like to fit some data sets. I read that LEVMW is the best program. does anyone has this program or a link for it?
Or are there other programms better than LEVMW?
Relevant answer
Answer
Here you can find the website for LEVM/LEVMW.
  • asked a question related to Data
Question
6 answers
Greetings respectable community of ResearchGate. I encountered some issues while gathering data from the World Bank Database, hence I would like to know if there are alternatives or other websites like the World Bank Database in which we can gather raw data.
The website can contain whatever form of indicators such as (developments, governance, competitiveness, economics, financial sector, etc.….) Thank you in advance for your assistance.
Relevant answer
Answer
There are many websites and databases that provide access to raw data on a variety of indicators, such as those related to development, governance, competitiveness, economics, and the financial sector. Some examples of websites that provide such data include:
United Nations Development Programme: https://data.undp.org/
International Monetary Fund: https://www.imf.org/en/Data
United Nations Statistics Division: https://unstats.un.org/unsd/default.htm
World Health Organization: https://www.who.int/data/gho
Organisation for Economic Co-operation and Development: https://data.oecd.org/
These websites provide access to a wide range of indicators and data on various topics, and can be a useful resource for those looking for raw data for their research or analysis.
  • asked a question related to Data
Question
6 answers
It is a dta file, can not be read with read.csv. I used haven to read it. However, the second row of column is real name, such as Checking Year and Checking Month , how can you extract it?
Relevant answer
Answer
Jochen Wilhelm Sorry to make you misunderstanding my questions. I want to extract all labels of those columns as column names. For example, we want extract label names: Individual id, Household ID, Community ID, Checking Year, Checking Season, Checking Date, Checking Day. And Set those names as column names.
Please contact me if more information was needed :)
  • asked a question related to Data
Question
1 answer
I had the same problem with the data collected from Thermo Scientific XPS. Here is the solution I found out, it may help other researchers.
Step 1: Open Avantage (on the XPS measurement computer), then go to "file"->"save as", a new window pops out.
📷(see the picture below)
Step 2: Name your .vgp file and click save. A new folder will be created, like Data Grid 2. DATA
Step 3: Open the software "DataSpace_BatchDump.exe". It can be found on the same computer with Avantage installed, in "C:\Program Files (x86)\Thermo\Avantage\Bin".
Step 4: Open the folder with .vgp/.vgd files in DataSpace_BatchDump, and click OK. Then, a new window pops out, find a location to export the files, like “C:\export”, then click ok twice. New .avg files will be saved in that location.
📷(see the picture below)
Step 5: Open CasaXPS, click “covert”, find “C:\export”. Type “.dth” and click ok. Then, the .vms files would be created.
📷(see the picture below)
Relevant answer
Answer
Weixin,thank you! Your detailed explanation helped me a lot.
  • asked a question related to Data
Question
3 answers
Hello, i'm looking for a reliable insect pest database that shares information about the occurrence, geographic distribution, hosts of all insect pest over the world... I made a little research on my own but the results aren't quite reliable in my opinion from the databases i found (gbif, cabi..). I also believe that many technical reports revealing the occurrence of those insects are being published in governmental research centers of every country however they aren't accessible online. Is there a way to get access to those reports?
Relevant answer
Answer
Dear Jane Dalley
Thank you for the reply.
I am indeed interested in agricultural pests however I am looking for an online database that allows me to get informations about those insects (taxonomic, feeding mode, distribution, origin...)
  • asked a question related to Data
Question
4 answers
Hi Researchers,
I am looking for journals that publish scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. Please let me know if you have any recommendations.
FYI, the dataset we are curating are bioinformatics image data.
Relevant answer
Answer
Have a look at Scientific Data: https://www.nature.com/sdata/
  • asked a question related to Data
Question
13 answers
Greetings everyone! could anyone kindly tell me where I can get data concerning the stock and bonds market?
Relevant answer
Answer
Yahoo. Finance, . Blomberg.
  • asked a question related to Data
Question
3 answers
Am looking for an email datasets from B2B sales. Is any data sources available?
Relevant answer
Answer
Shafagat Mahmudova , Fredrick Ishengoma what am looking for mail conversations among B2B sales, not as a email list . Thanks for this
  • asked a question related to Data
Question
1 answer
I'm new to fsQCA and would like to conduct a fsQCA analysis in one of my dissertation studies. The majority of fsQCA methods begin with data collection, followed by the preparation of the data matrix and the truth table. Conditions and outcomes are represented in the data matrix. Conditions, as we know, are the responses to surveys or interviews. I'm curious where the "outcomes" come from. Do we ask participants to rate the outcomes on a scale, as they did in the conditions?
For example, in the data matrix attached, there are five conditions (LPI, TAB, WPP, PAP, and NR) and outcomes are indicated by PubINF.
I acknowledge this is a very basic question, but I look forward to receiving your response.
Relevant answer
Answer
Nayan Kadam Fs/QCA (fuzzy-set qualitative comparative analysis) is a social science approach that combines case-oriented and variable-oriented quantitative research. It began with the development of qualitative comparative analysis, and Fs/QCA was later developed using fuzzy-set theory.
  • asked a question related to Data
Question
8 answers
I tried using Gigasheet but it does not have many features that are available in excel. Suggest me some freely available sources where I can load my ~1.7 million rows and do some calculations like sort multiple columns, remove duplicates
TIA
Relevant answer
Answer
I'd do this in R. It's free and sufficiently powerful to hande such tables with ease.
This could be an example work-flow (assuming you have a file named "my_huge_file.txt" which is a tab-delimited text file with many rows and a header row that contains the texts "ID", "value", "name" and "amount" [just for example!] as column names):
# read the file into a data.frame:
df <- read.delim("my_huge_file.txt")
# sort the rows by values in the column "name", then "value":
df <- df[order(df$name, df$value), ]
# remove all rows with dupplicate entries in the column "ID":
df <- df[!duplicated(df$ID), ]
# get the 5 rows with the largest values in the column "amount":
o <- order(df$amount, decreasing = TRUE)[1:5]
df <- df[o, ]
# getting the mean of the values in the colum "value"
# by each value in the column "name":
tapply(df$value, INDEX = df$name, FUN = mean)
  • asked a question related to Data
Question
4 answers
Hello,
Does anyone know how to change the time length to 2 or 3s in Ansys Static Structural -> Model -> Static Structural -> Pressure -> Magnitude -> Function -> Tabular data? The current setup only allows me to go up to 1s.
Thanks!
Relevant answer
Answer
You must start entering from higher end to the start. for example, enter 30th second step end time data and then proceed to to 1st second step end time data, that way the step controls entering is smoothly done.
  • asked a question related to Data
Question
5 answers
Hi all,
I'm having trouble converting one particular variable in my dataset from string to numeric. I've tried manually transforming/recoding into a different variable and automatic recoding. I've also tried writing syntax (see below). The same syntax has worked for every other variable I needed to convert but this one. For all methods (manual recode, automatic recode, and writing a syntax), I end up with missing data.
recode variablename ('Occurred 0 times' = 0) ('Occurred 1 time' = 1) ('Occurred 2 times' = 2) ('Occurred 3+ times' = 3) into Nvariablename.
execute.
VALUE LABELS
Nvariablename
0 'Occurred 0 times'
1 'Occurred 1 time'
2 'Occurred 2 times'
3 'Occurred 3+ times'.
EXECUTE.
Thank you in advance for your help!
Relevant answer
Answer
Konstantinos Mastrothanasis, by introducing manual copying & pasting etc., you make reproducibility much more difficult. IMO, anything that can be done via command syntax ought to be done via command syntax. The basic code Ange H. posted will work for the particular values she showed in her post--see the example below. If it is not working, that suggests there are other values present in the dataset other than the ones she has shown us. But we are still waiting for her to upload a small file including the problematic cases.
Meanwhile, here is the aforementioned example that works.
* Read in the values Angela showed in her post.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / svar(A20).
BEGIN DATA
'Occurred 0 times'
'Occurred 1 time'
'Occurred 2 times'
'Occurred 3+ times'
END DATA.
LIST.
* Recode svar to nvar.
RECODE svar
('Occurred 0 times' = 0)
('Occurred 1 time' = 1)
('Occurred 2 times' = 2)
('Occurred 3+ times' = 3) into nvar.
FORMATS nvar (F1).
VALUE LABELS nvar
0 'Occurred 0 times'
1 'Occurred 1 time'
2 'Occurred 2 times'
3 'Occurred 3+ times'
.
CROSSTABS svar BY nvar.
  • asked a question related to Data
Question
6 answers
Dear researchers, we tried to download AOD data from aeronet for Nepal stations, however maximum data are missing. Are there any other appropriate website to download AOD data? Need your suggestions, thanks. :)
Relevant answer
Answer
You can switch to satellite data sets or Re-analysis datasets to get aerosol optical depth !!
  • asked a question related to Data
Question
3 answers
My lab wants to try to do as much of our pre-processing, processing, and analysis in R as possible, for ease of workflow and replicability. We use a lot of psychophysiological measures and have historically used MATLAB for our workflow with this type of data. We want to know if anyone has been successful in using R for these types of tasks.
Any good R packages?
Relevant answer
Answer
Begin by consulting this very useful book by Jared Lander R for everyone to get started its available from the z-library and contains much useful research grade code. To find useful packages in R Google what you want to do alongside R package. Example attached. Best wishes David Booth
  • asked a question related to Data
Question
6 answers
I wanted to know the purity of MgSO4 and i went for XRF analysis for the same. I received data in the following format
[Quantitative Result]
---------------------------------------------------------------------------------
Analyte Result Proc-Calc Line Net Int. BG Int.
---------------------------------------------------------------------------------
S 61.3709 % Quant.-FP S Ka 1165.666 3.578
Mg 37.9584 % Quant.-FP MgKa 158.225 0.918
Ca 0.5466 % Quant.-FP CaKa 4.244 1.142
Si 0.1241 % Quant.-FP SiKa 0.862 0.148
can anyone please help me by explaining how to find purity of MgSO4 from this.
Relevant answer
Answer
Oxygen is tough for XRF because of its very low K-apha energy.
So it does not show up in the list.
However there seems to be something odd, as Gustavo Henrique de Magalhães Gomes mentioned above.
For MgSO4 your will have Mg:S = 1:1 in atomic ratio.
This will also hold for MgS.
But your XRF data say Mg:S = 38:61 ~ 1:2 (very roughly).
You should check your XRF system with certified materials...
  • asked a question related to Data
Question
9 answers
I welcome Answers and opinions.
Relevant answer
Answer
Citing under References is enough but if you want to publish any data table you need to take permission from the publisher
  • asked a question related to Data
Question
10 answers
Based on the literature review we get idea of moderators. But what we if want to introduce a new moderator in the literature.
1) What are the criteria for new moderator ?
2) How to theoretically support moderating variable ?
3) Is is necessary to adopt new moderating variable from same theory ?
Relevant answer
Answer
  • asked a question related to Data
Question
3 answers
Is there any special procedure to follow to get data?
  • asked a question related to Data
Question
2 answers
For RWE studies is important to find the correct RWD sources, there fore I am looking for other sources of Japanese drugs codes, diagnosis codes, etc. Other than JMDC. It would be helpful if any of you could help me.
  • asked a question related to Data
Question
7 answers
Hello;
We have two twenty years data sets, for a historical time span, and a future prediction. for both, statistical distributions are fitted for five-year intervals, and for historical and predicted data, the same statistical distribution (Johnson SB, Gen. Pareto, and Wakeby) اave been selected as the most appropriate distributions.
Similar statistical distributions have been obtained for all five-year intervals and for the entire twenty-year time series. We want to know what this similarity in data analysis means?
Best
Saeideh
Relevant answer
Answer
Thanks for the plots they look pretty good to me.,Best wishes David Booth
  • asked a question related to Data
Question
2 answers
Hello,
I have a huge number of data and I need to calculate such categorical statistical indices (e.g., POD, FAR, CSI, ETS) using python or R. I will be thankful for any kind of help.
Regards,
Relevant answer
Answer
Binary Forecast Verification is included within SwirlsPy - open-source Python lib:
  • asked a question related to Data
Question
3 answers
Dear colleagues
I've a CSV file (5 thousand lines) with information about the year,country,distance,product,quantity..
I can to open the file in notepad++ also..
Could you tell me please, how I can to construct the graph in excel or R studio with quantity or how to consider every quantity , which corresponds to the respective country?
Thank you very much
Relevant answer
Answer
  • asked a question related to Data
Question
4 answers
If I want the annual average of the country production of oil for 2019 and I have 25 stations,
1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual
2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country
Relevant answer
Answer
These are 2 different averages. The numerator is the same for both 1 and 2 -this is the sum of production of 25 stations for 12 months, i.e. the total annual production of all 25 stations.
But division this numerator by 25 gives you the annual average production per station.
Division by 12 gives you the average production of all 25 stations per month.
There is no single correct average. The average depends on how you define it and what you want to characterize-production per station or production per month..
  • asked a question related to Data
Question
3 answers
Hello everyone,
Could you recommend an alternative to IDC please to get records from the global datasphere for free?
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
  • asked a question related to Data
Question
3 answers
I am a research assistant to a doctoral student and she has been asked that for her thesis she must include on a CD a document management system with all the documentation she used, so that the reviewers of the work can quickly search through the documents, filter, search by keywords or between texts, etc....
I have searched and find that there are several such systems:
- OpenDocMan
- Seeddms
- I, Librarian
- OpenKm
- LogicalDOC
- Kimios
- others...
Several of them are web based and would be ideal as they offer the functionality we are looking for, but they are free as long as you are the one setting up a server. Others work as windows software but are not packable on their own to store on a CD. On the other hand I have not found options for free hosting even if it is low capacity and it does not make sense to pay indefinitely for such a system for a thesis work. *Excel is not an option for her unfortunately*.
I would like to know what system you know of that I could set up to search through documents and all this, so that I could save the whole system along with the documents on a CD, or it could be a Web solution but that I could have free hosting.
Thank you.
Relevant answer
Answer
Usually a PhD can be submitted on Word or as a pdf. Everyone can access and search these on just a normal computer. No management system required
  • asked a question related to Data
Question
8 answers
Dear colleagues,
I am studying the most relevant disinformation topics on a given subject and over a period of time. I intend to analyze the content of fact-checking articles.
Relevant answer
Answer
Thank you very much for this question of what content analysis tools u could use to check to (fact-check) the most current misinformation or disinformation fake news.
I am aware that there are always running cases of misinformation worldwide during this globalisation era propelled by telecommunications systems, the internet, and above all, the role of social media.
In some cases, misinformation is a cultural value, a religious principle, a national policy strategy, or a gender-specific issue. You are already tools and websites that help to list sources or qualities of misinformation (disinformation).
I suggest that you focus on the current misinformation (disinformation) activities that are trending in the war between Russia and Ukraine. This will be a very good and current area.
As you are better informed, content analysis is guided by the research topic, problem, and above all the specific objective or research questions. You need to select one or more categories of media that you intend to study (analyse) and the period involved.
How do you do the content analysis in your current research will be done in a systematic way. For example, w to conduct a content analysis do:
1. Select the content you will analyze. Based on your research question, choose the texts that you will analyze.
2. Define the units and categories of analysis.
3. Develop a set of rules for coding.
4. Code the text according to the rules
5. Analyze the results and draw conclusions.
You will draft your first report and keep revising it. Take note that misinformation can be attributed in some cases to official policies at all levels, honest mistakes, poor editing or proofreading, ignorance or functional illiteracy, prejudices or discrimination, as well as good intentions for temporary solutions.
Best regards
Wilson
  • asked a question related to Data
Question
5 answers
I am developing a model to optimise a sustainable energy system for community buildings. The system uses renewable energy, battery storeage and intelligent building management to optimise the energy used by the building. I cannot find any data on electricity use patterns for community builings (Village/Church Halls) across the year. There seems to be lost for domisetic property and some for normal commercial property (offices/shops/factories). I have limited data which shows a marked summer/winter pattern but would be grateful if anyone could share any larger data sets. At the moment the buildings are all in the north of England but ideally we woudl like to develop a model that works anywhere.
Relevant answer
Answer
Referring the data in similar buildings would not be realistic for the success outcomes due to the number of uncertainties; such as zonal area location, altitude & elevation, type of activity, number of occupants & their behaviors, type of utility applicable, building life etc.
My advice is to collect the particular building electricity, water and fuel consumption bills and establish the energy scenario. Further estimate the monthly electrical energy consumption of particular building by the help of an energy equipment inventory with those rated capacities and average operational times. Then you can compare both inventory estimation and bill propagations to decide the energy consumption pattern in the particular premises. This will help you for the accurate and proper system design.
  • asked a question related to Data
Question
3 answers
In R, how do you generate random data whose distribution needs to satisfy specified skewness and kurtosis
Relevant answer
Answer
Skew and kurtosis just two measurements of a distribution. Say more about what you want the distributions to be like. In we create several distributions with skew and kurtosis of different population values, but you need to provide more information to know if any of these are appropriate. If you want sample values of these that are specific, you could transform other distributions into these.
  • asked a question related to Data
Question
3 answers
I have recorded behavioural data, such as incidences of aggression and grooming partners, in a troop of lemurs over three conditions.
What tests should I be using to compare the rates of aggression in the three conditions?
For the grooming partner data, I want to compare grooming between sex dyads. For instance, the frequency of male-male grooming compared to male-female grooming within each condition and then compare the average proportion of grooming between sex dyads in the three conditions. How would I do this?
Thank you in advance for your help. Apologies if this question is poorly worded, I am very new to data analysis.
Relevant answer
Answer
Your explanation or the dataset sounds that it will be suitable for a two-way ANOVA. Other things being fine
  • asked a question related to Data
Question
3 answers
I'm doing a systematic review and I need an access to EMBASE and Scopus databases to do research (my institution doesn't offer such an access)
Could someone help?
Regards,
Relevant answer
Answer
I ran into the same issue a while back, your best bet is to reach out to someone that studies/works in an institution that has access (preferably a friend or a colleague not a stranger) and ask for their institution credentials (you access their account on your computer) though this might not always work since the IP difference is usually detected and you are not always granted access from outside the institution. I would suggest someone in Qatar University, you can use their credentials at your working place and it has basically all the databases. Someone in the US or Occupied Palestine can also do the trick but watch out for the IP issue. Finally, if none of the above works, you can try emailing your search strategy (I would advise tailoring it according to the rules of the database) to someone with access and have them run the search for you and send the results. Good luck!
  • asked a question related to Data
Question
11 answers
Hi,
My research involves looking at whether an education initiative (workplace training) increase employee knowledge and engagement in corporate sustainability missions.
Research includes:
1) A pre Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
2) An education intervention (training)
3) A post Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
These have already taken place and I have questionnaire responses for 20 participants.
How do I go about interpreting this/ analyzing this? I have read lots of different answers online and can't seem to find a common answer
Any help will be appreciated- Thank you
Eimear
Relevant answer
Answer
Hi Lewis,
Thank you. Before the research, I had looked up before/after and decided on Pearson Correlation coefficient but when doing more research, realised this might be wrong?
I'll take a look on YouTube. For my literature review, I have identified several papers that have similar studies, but again, they all use a different method so that's why I'm unsure on which way to approach it
Thank you for your help - I'll also check out that book!
Eimear
  • asked a question related to Data
Question
9 answers
hey guys, I'm working on a new project where I should transfer Facebook ads campaigns data to visualize in tableau or Microsoft power BI, and this job should be done automatically daily, weekly or monthly, I'm planning to use python to build a data pipeline for this, do you have any suggestions or any Resources I can read or any projects similar I can get inspired from ? thank you .
Relevant answer
Answer
To create an ETL pipeline using batch processing, you must first:
1. Construct reference data: create a dataset that outlines the range of possible values for your data.
2. Extract data from various sources: Correct data extraction is the foundation for the success of future ETL processes.
  • asked a question related to Data
Question
22 answers
Data science is a growing field of technology in present context. There have been notable applications of data science in electronic engineering, nanotechnology, mechanical engineering and artificial intelligence. What kind of future scopes available for data science at civil engineering aspects in the field of structural analysis, structural design, geotechnical engineering, hydrological engineering, environmental engineering and sustainable engineering?
Relevant answer
  • asked a question related to Data
Question
3 answers
We ran different experiments in our lab where we exposed corals to different factors, e.g. Experiment 1 looking at ocean acidification and phosphate enrichment, and Experiment 2 looking at ocean acidification and nitrate. In both experiments, we have a control group, each factor (acidification and eutrophication) alone, and then a group exposed to both stressors at the same time. As our sample size is rather small, we thought of pooling the data from different experiments when corals experienced the same treatment, e.g. the pure acidification groups from Experiments 1 and 2. And here is the question: Which test(s) should we run to decide whether we can pool our data or not? We assume that we can only pool the data if there is no significant difference between the response (like respiration rate) in corals exposed to pure acidification in Experiments 1 and 2, correct?
We thought that we could compare the means of the groups (using ANOVA or Kruskal-Wallis), compare the ranks (using PERMANOVA), or look at similarity (using PCO and ANOSIM). Unfortunately, depending on the test, the outcomes of them are different (surprise!) and we don’t know which test is the “correct” one to make the decision to pool or not to pool.
Or maybe we don’t have to test them at all and can just pool them? What is the correct way/test to make this decision?
Relevant answer
Answer
Hello Selma,
I see three options:
1. Treat the two experiments as nested within occasions (use a mixed model for analysis). This would allow you to segregate effects that were due to time/occasion/experiment differences, and would be defensible if there was any likelihood that time/occasion/experiment differences could have influenced results.
2. If you and your team are convinced that the conditions from which you propose to combine data were identical in every other way save for time/occasion/experiment, then pool them and treat as one set.
3. Analyze each set independently, then use Fisher's method to compute a combined p-value for the set of two experiments. [Compute: -2*sum of natural log of individual outcome p-values; refer to a chi-square distribution based on 2*k, where k is the number of outcome p-values being combined.] This approach is sometimes used in meta-analytic studies.
Obviously, do explain your chosen method and the rationale in writing up results.
Good luck with your work.
  • asked a question related to Data
Question
3 answers
Usecase- To provide the security of the data by building Next-generation firewalls or Is there any better firewall type to handle the normal systems. Please do suggest me any answers!!.
Relevant answer
Answer
Dear Roshan Reddy,
To enable security of electronic communications there are three groups of problems - endpoint security, cloud / network security, and identity and access management security. Firewalls are important, but it's just a part of the puzzle.
  • asked a question related to Data
Question
4 answers
I'm trying to analyse some data from my last experiment, where I grew two varieties of potato in a range of pot sizes with well-watered and water-restricted conditions, to see if the size of the pot would affect the relationships between water restriction and measures of plant morphophysiology over time.
Unfortunately, I have absolutely no idea how to analyse these data, which looks like this (5 pot sizes, 2 genotypes, 2 treatments, and about 11 dates)... Each combination of factors was replicated in triplicate. To be honest, I'm not even sure what I'm trying to look for, my brain's not great with numbers so I'm just sitting staring at Minitab. Any help at all would be amazing. Thanks.
Relevant answer
Answer
Cindy – I presume someone is supervising your experiment. It's time to investigate the statistical support resources available to you.
You also need to write your results section now – I'm not kidding about this. Look at papers that have done similar experiments and see how they presented the data, and how they wrote it up. Use this as a template. Write the results section drawing up blank tables to show your results and leaving gaps in the text awaiting the appropriate result.
I say this because with repeated measurements you need to define your study endpoint precisely. Are you interested in how big the plants are at a given time point, or how long it takes them to reach a given size? And indeed, is size what counts? Measured how?
You can also be getting a better feel for your data by graphing the results. Draw line graphs for each plant's growth, coloured to show which group they belong to. Do this for each variable separately – water level, genotype, pot size etc. Look out for peculiar data points! They can have devastating effects on mean values.
So read similar studies and dummy up your analysis plan, meanwhile making lots of graphs. Then make an appointment with your local friendly statistics service to discuss how you implement the analysis in whatever software you are using.
  • asked a question related to Data
Question
7 answers
Dear all,
I wanted to evaluate the accuracy of a model using observation data. My problem is the correlation of the model with observed data is really good (bigger than 0.7) but RMSE is very high too (like bigger than 100 mm in a month for monthly rainfall data). How can I explain it? the model also has low bias.
How to explain this case?
Thank you all
Relevant answer
Answer
The choice of a model should be based on underlying physical explanations first.
See the manual of my software for examples:
  • asked a question related to Data
Question
3 answers
Dear collegues,
I try to a neural network.I normalized data with the minimum and maximum:
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
maxmindf <- as.data.frame(lapply(mydata, normalize))
and the results:
results <- data.frame(actual = testset$prov, prediction = nn.results$net.result).
So I can see the actual and predicted values only in normalized form.
Could you tell me please,how do I scale the real and predicted data back into the "unscaled" range?
P.s. minvec <- sapply(mydata,min)maxvec <- sapply(mydata,max)
> denormalize <- function(x,minval,maxval) {
+ x*(maxval-minval) + minval
doesn't work correct in my case.
Thanks a lot for your answers
Relevant answer
Answer
It actually works (but you have to consider rounding errors):
normalize <- function(x, min, max) (x-min)/(max-min)
denormalize <- function(x, min, max) x*(max-min)+min
x <- rnorm(1000)
r <- range(x)
nx <- normalize(x, r[1], r[2])
dnx <- denormalize(nx, r[1], r[2])
all(x == dnx)
# FALSE ---> rounding errors
all(abs(x - dnx) < 1E-8)
# TRUE ---> identical up to tiny rounding errors
  • asked a question related to Data
Question
16 answers
Some literature clarify that they used quarterly data from the source of world-bank is that available? or they transform the annual data? how this transformation done?
Relevant answer
Answer
International Debt Statistics • These include the high frequency, quarterly data for high-income economies and select developing countries reporting to the joint World Bank–IMF Quarterly External Debt Statistics (QEDS) and the Quarterly Public Sector Debt (PSDS) database. [on-line] https://data.worldbank.org/products
• Quarterly Public Sector Debt (QPSD) [on-line] https://www.worldbank.org/en/programs/debt-statistics/qpsd
  • asked a question related to Data
Question
2 answers
Hi,
I am looking for the latest data of population from authentic sources/government authorities for anytime between 2012-2021. It will be for my study area, South 24 Parganas district, West Bengal. Any leads/contacts for acquiring the same will be of great help for my research work.
Thank you.
  • asked a question related to Data
Question
7 answers
I have used the merge function in SPSS loads of times previously with no problems. However this time I am running into an unusual issue and can't find any information online to overcome it.
I am trying to merge 2 SPSS data files: set 1 contains demographic data on 8504 cases and set 2 contains blood data on 6725 of those cases.
Using point-and-click options in SPSS I tried MERGE FILES>ADD VARIABLE>one-to-one merge based on key values (key variable = ID). However this results in a file with duplicate cases i.e. row 1 and row 2 are both subject ID 1, row 1 shows the values for the demographic data for that subject, while the blood data cells are blank, and in row 2 the demographic data cells are blank and the blood data are there. Screenshot attached.
I tried following this up with the restructuring command to try and merge the duplicate case rows but it did not alter the data set.
I've double checked that my ID variable in set 1 and set 2 matches in type/width/decimals etc.
I've tried the following syntax
MATCH FILES /FILE=*
/FILE='DataSet2'
/BY ID.
EXECUTE.
But none of the above has worked. Any advice would be HUGELY appreciated!
Relevant answer
Answer
Good morning Caoileann Murphy. I don't know if this will work with non-printing ascii codes, but I would try this:
* Strip non-printable ascii control codes from ID.
RENAME VARIABLES (ID=oldID).
* Make a new ID variable--adjust length as needed.
STRING ID (A5).
COMPUTE ID = oldID.
LOOP #i = 0 to 31.
- COMPUTE ID = REPLACE(ID,STRING(#i,PIB),"").
END LOOP.
* References.
Good luck!
  • asked a question related to Data
Question
16 answers
I have been struggling to get the export import data of Taiwan. In wb website, Taiwan is not listed as a country, so nothing can be found. Is there any reliable sources for country specific (Taiwan) data?
Relevant answer
Answer
Also check please the following very good link: https://www.ceicdata.com/en/indicator/taiwan/total-exports
  • asked a question related to Data
Question
5 answers
Hello everyone,
I am looking for links of scientific journals with dataset repositories.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Relevant answer
Answer
Dear Cecilia-Irene Loeza-Mejía
I think you should have a look at the site «re3data: Registry of Research Data Repositories» (https://www.re3data.org).
There you will find the following search/browsing options: Browse by content type Browse by subject Browse by country
When you choose "Browse by content type", you will get "Raw data" or "Scientific and statistical data formats" (among others): https://www.re3data.org/browse/by-content-type/.
With best regards Anne-Katharina
  • asked a question related to Data
Question
8 answers
all rows are not visible due to large no. of rows, is there any method to view entire rows? I am using pandas to read the file
Relevant answer
Answer
If you don't want to set pandas global options you could also do
with pd.option_context("display.max_rows", None, "display.max_columns", None):
display(HTML(df.to_html()))
and this will apply to the "with" context only.
PS: you may need to import HTML from IPython
from IPython.display import HTML
  • asked a question related to Data
Question
15 answers
I am a PhD student and am currently working on metabolite profiles of some marine invertebrates.
While analysing some raw data generated from LC-MS, HRMS, NMR and FTIR, I was told by some researchers that these raw data, once submitted to a journal as supporting files, cannot be used further for any other analysis. For each analysis I need to generate the raw data again, otherwise it will be treated as a case of self-plagiarism.
I can see that my raw data has a potential of producing three distinct publications. I can analyse different parts of my raw data differently to present distinct conclusions.
But generating all the raw data again from these analyses, and that too for each publication, does not look sustainable to me. And clubbing all three publications in one also does not seem to be a good option here.
So I would like to know your views on this matter as a researcher and also as an Editor/Reviewer. Also, please share your similar experiences and solutions to it.
Relevant answer
Answer
It depends. Much data, for example fisheries records, are published for a given purpose - for example to manage a fishery. Many years after that data was published, it may provide other researchers with other information - for example on understanding "shifting baselines". There are many very good pieces of research using historic datasets. My herbarium vouchers are a form of "data". They are lodged in public Herbaria across Australia. The Herbarium staff make the vouchers, and the associated environmental data, available to researchers around the world. They do not limit the number of times any particular voucher may be used to provide a datapoint in someone's research. Those of us who contribute these data rarely hear about their reuse unless we subscribe to platforms like Bionomia. Over a career collecting environmental data I have much that I may never explore fully. I use platforms like the Atlas of Living Australia's BioCollect platform to host my old datasets. It is possible that one day someone may use the hypersaline lake diatom and physico-chemical data from Australia to develop a "diatom metric" index of water quality for these understudied habitats... or for gnammas, or coastal lagoons...
IAs you know what analyses you plan to subject the dataset to, maybe you would be better served in clubbing a group of papers together that look at the different things you have extracted from the dataset, and then provide the entire group to one journal to publish as a "set".
But I would not like for you to have the opinion that data may only be used once then needs be discarded. What a waste of effort. Well conserved datasets, with excellent metadata relating to the methodologies and data collection, can be valuable into the future, in ways we have no current understandings about.
  • asked a question related to Data
Question
5 answers
When should household panel data be weighted to increase the representativeness of the data? Does this depend on the number of reporting households?
Relevant answer
Answer
The weighted mean used in meta-analysis. It gives higher weight to accurate study and low weight to study less accurate.
  • asked a question related to Data
Question
4 answers
Hi,
Should the sociodemographic data of qualitative research be equally distributed? I will be glad if you send me your opinions and sources about this issue. Thanks in advance.
Relevant answer
Answer
In qualitative research you are aiming to cover the ground, so to speak. Often, this involves purposive sampling to make sure that all relevant points of view, experiences etc are included. However, the demographics may not be (probably are not) the best way of selecting participants. What you want is some way of making sure that everyone is heard.
  • asked a question related to Data
Question
13 answers
I am currently looking for import, export, current account balance and Balance of payment data from the year 1960. WDI, unctadstat show lesser amount of data (starting from 1980s). Is there any reliable data streams/sources where i can access the data free of cost? How about the published data?
Relevant answer
Answer
You can check IMF as well, if not visited their website yet?
  • asked a question related to Data
Question
3 answers
I'm looking for information on home value data in the neighborhood of Chelsea, NYC for a survey of the impacts of green gentrification in the region brought about by High Line Park. Contributions in this regard are very welcome.
Relevant answer
Answer
Zillow may be a useful source for current values. Depending on your timeframe, tax assessment records -- generally available for free -- can be useful, but they have a large amount of lag. Hence the reason to tap Zillow or similar for recent data. But Zillow is labor-intensive, requires looking up individual addresses. Zillow & assessor files also give you housing characteristics that you will need. Economics journals contain numerous studies of impacts on property values, eg, of firearm injury/crime rates, noise, pollution. Many are fraught with problems. Read some of the studies & critiques before deciding to proceed.
  • asked a question related to Data
Question
42 answers
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Of course, the right data is important for scientific research. However, in the present era of digitalization of various categories of information and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and Big Data database systems, it is important to develop techniques and tools for filtering large data sets in those databases data to filter out of terabytes of data only information that is currently needed for the purpose of conducted scientific research in a given field of knowledge, for the purposes of obtaining answers to a given research question and for business needs, eg after connecting these databases to Business Intelligence analytical platforms. I described these issues in my scientific publications presented below.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Respected Doctor
Big data has three characteristics as follows:
1-Volume
It is the volume of data extracted from a source, which determines the value and capabilities of the data to be classified as big data, and by the year 2020, cyberspace will contain approximately 40,000 megabytes of data ready for analysis and information extraction.
2-Variety
It means the diversity of extracted data, which helps users, whether they are researchers or analysts, to choose the appropriate data for their field of research and includes structured data in databases and unstructured data (such as: images, clips, audio recordings, videos, SMS, call logs, and data). Maps (GPS), and require time and effort to prepare them in a suitable form for processing and analysis.
3-Velocity
It means the speed of producing and extracting data and sending it to cover the demand for it. Speed is a crucial element in making a decision based on this data, and it is the time we take from the moment this data arrives to the moment the decision is made based on it.
There are many tools and techniques that are used to analyze big data, such as: Hadoop, Map Reduce, HPCC, but Hadoop is one of the most famous of these tools. Big data is on several devices and then distributes the processing process to these devices to speed up the processing result and is returned or called as a single package. Tools that deal with big data consist of three main parts:
1- Data mining tools
2- Data Analysis Tools
3- Tools for displaying results (Dashboard).
Its use also varies statistically according to the research objectives (improving education, effectiveness of decision-making, military benefit, economic development, health management ... etc.).
greetings
Senior lecturer
Nuha hamid taher
  • asked a question related to Data
Question
15 answers
I have measured pain intensity using numerical pain intensity scale (NRS) in intervals of 30 min upto 6h for 4 groups. The NRS scale is from 0-10. Can anyone guide me on what statistical comparison can I use to compare the 4 groups?
Relevant answer
Answer
One Way analysis of variance would be the right tool if the observations are atleast measured on an interval scale. If the pain intensity scores aren't ordered, then you may proceed with ANOVA or its non-parametric equivalent i.e. Kruskal Wallis Test(if the normality/homogeneity of variance assumption is violated). If the pain intensity scores are ordinal, then you must go ahead with Jonckheere Terpstera Test which is the analogue of ANOVA considering the ordinal nature.
  • asked a question related to Data
Question
7 answers
I am currently doing a PCA on microbial data. After running a Parallel Analysis to determine the number of factors to retain from the PCA, the answer is 12. Since my idea is to save the factor scores and use them as independent variables for a GLM together with other variables, I was wondering:
  • Should I definitely save the factor scores of all 12 factors (which would become too many variables) or I can save only a few of them (e.g., the first 3 which together explain a 50% of the variance) for the GLM?
  • If I can save a lower number, should I re-run the PCA retaining only that lower number (e.g. 3) or just use the factor scores already obtained when retaining the 12 ones?
Thank you all for your time and help!
Relevant answer
Answer
Hello Abdulmuhsin S. Shihab. The Preacher & MacCallum (2003) article I referred to in my earlier post explains (among many other things) why eigenvalues > 1 is a very poor way to determine the number of factors (or components) to retain:
HTH.
  • asked a question related to Data
Question
3 answers
Where can I get global Covid19 related data publicly available for research and academic work. Specifically looking for patient data, demographics, distribution etc.
  • asked a question related to Data
Question
23 answers
Hi everyone
I'm looking for a quick and reliable way to estimate my missing climatological data. My data is daily and more than 40 years. These data include the minimum and maximum temperature, precipitation, sunshine hours, relative humidity and wind speed. My main problem is the sunshine hours data that has a lot of defects. These defects are diffuse in time series. Sometimes it encompasses several months and even a few years. The number of stations I work on is 18. Given the fact that my data is daily, the number of missing data is high. So I need to estimate missing data before starting work. Your comments and experiences can be very helpful.
Thank you so much for advising me.
Relevant answer
Answer
It is in French
  • asked a question related to Data
Question
4 answers
Dear Colleagues,
I've intended to write a research article dealing with variances of unmodeled effects that limit accuracy of GNSS relative positioning. This article would be a continuation of my several-year well-documented research that is based on 2-Way Nested ANOVA.
Due to a shortage of other GNSS raw data, I've used only GPS raw data in my research so far. However, for this study, I need epoch-wise ambiguity-fixed baseline solutions, necessarily obtained by post-processing of multi-GNSS (GPS, BeiDou, Galileo, GLONASS, ...) raw data in, for example, Bernese or any other advanced processing 'machine' (with the use of all proposed corrections along with the IGS final product data).
I offer a collaboration to anyone who provide those input data for the study.
P.S: Please, if any interest, send me a private RG message to give you detailed information.
Respectfully