Python for Data Science & Analysis
- RICHA RAMBHIA
- Aug 1, 2022
- 2 min read
Python is a popular programming language which is used for various purposes such as web applications, web development, software development, data science and analysis, and connecting to databases. The focus here would majorly be upon on Python for Data Science and Analysis.
We are well aware of the fact that Python has various packages and libraries which can be used easily for the analysis of a given dataset. Let's list some of the packages and libraries of Python that are widely used for the analysis of the data.
Python Packages & Libraries
{
The difference between a Python Package and a Python Library is that, package also contains sub-packages inside it and is a collection of modules, whereas libraries are a collection of packages.
}
Pandas : Used for data science/data analysis and machine learning tasks which is a high-level data manipulation tool.
NumPy : Used to perform wide variety of mathematical operations on arrays and supports for large, multi-dimensional arrays and matrices.
SciPy : Used to solve scientific and mathematical problems which allows the user to manipulate and visualize data.
Matplotlib : Used to create static, animated, and interactive visualizations in Python.
Seaborn : A python data visualization library based on matplotlib providing a high interface for statistical graphs.
Plotly : Used to create interactive web-based visualizations which can be displayed in Jupyter notebooks or HTML files.
Bokeh : Used for creating interactive visualizations for modern web browsers.
Scikit-learn (Sklearn) : It is the most useful and robust library for Machine Learning in Python.
Keras : Used for creating deep models and for distributed training of deep learning models.
Tensorflow : Provides collection of workflows to develop and train models and deploy in the cloud.
PyTorch : Used for deep learning applications using GPUs and CPUs.
Theano : Used in building deep learning projects that allows to evaluate mathematical operations including multi-dimensional arrays.
NLTK : Natural Language Toolkit is used for building applications that work with human language data for applying in statistical NLP.
spaCy : Used for production use and helps to build applications that process and understand large volume of texts.
pickle : Used for serializing and de-serializing Python object structures, also called as marshalling or flattening.
Streamlit : Is an open source app framework which helps to create web apps for data science and machine learning.
Pillow : It is Python Imaging Library that supports for opening, manipulating, and saving images.
pytest : Used for writing tests for APIs.
Summary
The above mentioned libraries and packages can be used for any data science or data analysis projects in order to analyze the data.
There are even more packages and libraries in Python that are used for various other applications which can be researched upon for further use.
References
Komentar